Managed peer-to-peer content backup service system and method using dynamic content dispersal to plural storage nodes

ABSTRACT

Systems, method, computer program stored on computer readable media, and business method for providing and operating a distributed network based secure storage of business or consumer digital data or content. System, method, computer program stored on computer readable media and business model for dynamically managed peer-to-peer media content backup that uses a plurality of subscriber personal computer based storage devices to store backups of other subscriber data in a manner that is secure and redundant.

FIELD OF THE INVENTION

This invention pertains generally systems and methods for distributednetwork based secure storage of consumer digital data or content, andmore particularly to a system, method, computer program stored oncomputer readable media and business model for dynamically managedpeer-to-peer media content backup that uses a plurality of subscriberpersonal computer based storage devices to store backups of othersubscriber data in a manner that is secure and redundant.

BACKGROUND OF THE INVENTION

As more people begin to use digital cameras, digital video cameras,electronic music players, or other type of electronic devices,information appliances, and the like they are generating larger andlarger amounts of digital data and other content. This data and contentmay usually include content that is associated with irreplaceablememories such as digital photographs and videos, in addition to musicthat has been purchased through on-line music stores or other providers.The value associated with this content may be sentimental and emotionalespecially for content that is irreplaceable if lost or would representa significant financial loss to replace such as for music, video, orother multi-media content in the event it were lost and needed to bereplaced.

It may also be appreciated that as these cameras, music players or otherdevices and information appliances proliferate and are provided withsimple intuitive interfaces, they may be used with increasing frequencyby users without sophisticated computer skills, and/or by users who maynot appreciate the potential for loss of the data or content that may beinherent in either single device or single physical or geographicallocation storage.

Many users, and perhaps the vast majority of users, never transfer anyof their digital content to a physical representation (e.g. photoprints), and never back-up their data or content to a truly safeenvironment, or in any kind of a redundant manner that may guaranteewith absolute or high probability, an ability to recover the data orcontent should a data loss occur. As a higher percentage of this contentor other digital asset is only stored in the digital world, loss of thiscontent or these assets due to local computer hard disk failure,computer virus or other malicious code or hacker attack, physicalcomputer or information appliance theft, or fire, water or other naturaldisaster becomes a truly catastrophic event.

As more and more users find that they or others have lost personaldocuments either in the form of digital content or even in the form oftraditional film or paper photographs, videos, and other personal orfamily documents, more and more people are looking for solutions toback-up and prevent the loss of their own photos, videos, music andother content. People are frequently reminded of their own potentialvulnerability to loss when they watch broadcast news, browse theInternet, read books or periodicals, or otherwise become aware of homefires, floods, hurricanes, tornadoes, home invasions, or just generalthefts and break-ins, the result of which is the loss of digital contentor assets as well as of non-digital assets that might have beenconverted to a digital form through a scanning or other paper or printedmedia to digital conversion process. Therefore while the problem withconventional storage and backup may be seen to apply primarily tocontent or other assets that exist in digital form, it may beappreciated that a needed solution extends to content that may be placedinto digital form so that it may be stored in a manner that reduces thelikelihood of loss.

Conventional existing digital content back-up solutions are less userfriendly and frequently require users to have some computer technicalknowledge and often to have a strong technical knowledge and ability inorder to effectuate even a local backup of the digital content or assetby such means as copying or writing (e.g., burning) the content to anoptical media such as to a CD or DVD, copying the content or asset adirectly attached local storage such as for example to an externalUniversal Serial Bus (USB) hard disk drive, or copying the content orasset between multiple personal computers (PCs). These conventionalattempted solutions also frequently require that a person purchase andthen attach some form of external storage device beyond that which wassupplied with the computer (if any), and then when their content isbacked-up, that they find some place to safely store their backed-upcontent, and maintain it in a manner that does not subject it to damageor being overwritten

Even when a person has purchased appropriate storage devices, and whererequired, a software solution to aid with performing the backup, therequirement to setup external storage devices is a significant deterrentto performing the backup for the typical user. Additionally, thephysical storage solution that a user chooses is often not adequate toprotect against common losses. For example, theft, fire, water damagewill often target or affect all the computer and entertainment equipmentin the consumer's home, which will likely include the backed-up deviceand potentially the backup media if separated from the backup device.Viruses may also be present for long periods of time on the user'smachine before detection and can infect the backup material and files aswell as the original machine. Therefore, even when a person has beendiligent about backing up the digital content or asset, it may still besubject to partial or complete loss using conventional practices,systems, and methodologies.

In a partial but largely unsuccessful attempt to solve at least someproblems associated with digital content, a limited number of onlinebackup techniques have emerged in an attempt to solve some of theproblems associated with the existing in-home or consumer back-upsolutions. Some of these solutions attempt to provide storage outsidethe home to alleviate the concerns of fire or water damage and theft,but they often require the user to actively manage their content backupprocess. For example, the user may usually still need to interact withthe online storage site to actively copy the digital content to bebacked-up, again requiring some degree of technical understanding thatmay lie outside of a non-technical consumer's expertise.

The operating or business model of these backup services and sites arebased on the idea that consumers receive a limited amount of storage(typically between about 3-5 GB) space for free storage and then need topay a monthly (or other periodic) fee as their consumed storage goes upbeyond the free allocation. Since digital still cameras and digitalvideo cameras are producing higher resolution content, up to perhaps 8mega pixels per still image from 3 mega pixels per still image only afew years ago, increasingly the user may quickly exceeds the freestorage space allocation, and be subject to monthly excess storage fees.

Relatively new solutions from computer hardware and software providers,manufacturers, and/or vendors (such as for example from Apple Computer,Microsoft, and independent PC manufactures) are looking to solve theproblems of requiring users to manage their backup process. Theproviders, manufacturers, and/or vendors provide solutions that act atleast somewhat autonomously by automatically backing-up content on thepersonal computer (PC) or other information appliance as the user usestheir computer. However, these backup process may appear to be automatedand may seem to an ordinary consumer to provide all of the dataprotection that is needed, all of these proposed solutions use the samestorage device (such as the single hard disc drive) that is being usedfor storing the original content. While this provides a solution thatenables local retrieval of accidentally erased content, it does notprotect against any other type of disaster or loss, including forexample, losses that are do to hard disk drive hardware or controllerfailures, theft, fire or water damage, virus or malicious code attack,or a plethora of other computer problems or failure modes.

There have been some attempts to use information dispersal as an aid toachieve some measure of security or fault tolerance. One example of aconventional information dispersal approach and algorithm is suggestedin the paper by Michael O. Rabin, entitled “Efficient Dispersal ofInformation of Security, Load Balancing, and Fault Tolerance” (Journalof the Association for Computing Machinery, Vol. 36, No. 2, April 1989,pp. 335-348.), which is incorporated by reference herein and hereinafterreferred to as Rabin or the Rabin paper or reference. However, thisapproach alone does not take into account the needs of a consumerdirected backup system where some nodes may be determined to beunreliable and the benefits and needs for dynamic redispersal ofinformation over time. It also does not take into account differentredundancy requirements that may exist in a consumer oriented managedpeer to peer backup service.

Another attempt to implement a file sharing system using a peer-to-peer(P2P) approach is described in a paper by Andrew Tytula as part of therequirements for a Carleton University 95.495 Honors Project and underthe supervision of Professor Tony White is entitled “Peer-to-Peer FileSharing System using an Information Dispersal Algorithm”.

A further description of some aspects of distributed backup aredescribed in a set of notes available on the web entitled “DistributedBackup through Information Dispersal” by Giampaolo Bella(giamp@dmi.unict.it), Costantino Pistagna (pistagna@dmi.unict.it), andSalvatore Riccobene (sriccobene@dmi.unict.it) all associated with theUniversità Degli Studi di Catania.

Unfortunately, none of these attempted distributed storage solutionsprovide the features and capabilities needed for a consumer storagedevice based free to the user on-line backup storage with retrieval andrecovery features.

There remains therefore a need for a system, system architecture, andmethod that overcomes these problems and limitations of conventionalsystems and methods.

SUMMARY

In one aspect, an embodiment of the invention provides a server computerfor operating a distributed data storage system having data security,redundancy, and retrieval features, the server including: a processorand a memory coupled to the processor; a network communicationsinterface for coupling the server computer to a network; a database forstoring data pertaining to the distributed storage in the distributeddata storage system and coupled to or coupleable with the processor; anetwork node reliability monitor for monitoring the reliability of theplurality of nodes on which the data is stored and for generatingstorage node reliability information; and an information dispersal andcontrol unit for initially dispersing data for backup storage to aplurality of network storage nodes and for dynamically redispersing thedata over time according to the storage node reliability information.

In another aspect, an embodiment of the invention provides a system foroperating a distributed data storage system having data security,redundancy, and retrieval features, the system comprising: a servercomputer including: a processor and a memory coupled to the processor; anetwork communications interface for coupling the server computer to anetwork; a database for storing data pertaining to the distributedstorage in the distributed data storage system and coupled to orcoupleable with the processor; a network node reliability monitor formonitoring the reliability of the plurality of nodes on which the datais stored and for generating storage node reliability information; andan information dispersal and control unit for initially dispersing datafor backup storage to a plurality of network storage nodes and fordynamically redispersing the data over time according to the storagenode reliability information; and a plurality of user nodes at least afirst one of the nodes including a first user interface adapted for afirst user to identify a data set for backup storage and at least asecond and third different ones of the nodes adapted for storage of aportion of the first user data to be backed up.

In another aspect, an embodiment of the invention provides a method formaintaining reliable distributed storage on a network comprising aplurality of data storage nodes, the method comprising: dispersing thedata to data storage nodes according to the current dispersementstrategy; monitoring and verifying the continued reliability of eachpeer storage node on which a user data is stored; determining if astorage node has become unavailable or unreliable; and redispersing thedata to different storage nodes if it is determined that a storage nodehas become unreliable, and maintaining the current data dispersement ifthe storage nodes on which the data is stored are not determined to beunreliable.

In still another aspect, an embodiment of the invention provides abusiness method for generating monetary revenues from a distributed datastorage system service having data security, redundancy, and retrievalfeatures, the method comprising: providing a managed consumer backupservice to a consumer without a user fee in exchange for the userproviding storage for at least one other different user data; presentingadvertisements to a user when the user interacts with the storage systemservice; and collecting revenues from the entities placing theadvertisements.

In one aspect, an embodiment of the invention provides a computerprogram stored on a computer readable media storing one or moreprocedures or methods of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagrammatic illustration showing a first embodiment of asystem configuration and architecture.

FIG. 2 is a diagrammatic illustration showing a second embodiment of asystem configuration and architecture including aspects of an exemplarydatabase.

FIG. 3 is a diagrammatic illustration showing a third embodiment of asystem configuration and architecture including aspects of an exemplarydatabase and optional compression/decompression, encryption/decryptionand web access node features.

FIG. 4 is a diagrammatic illustration showing an embodiment of adatabase.

FIG. 5 is a diagrammatic illustration showing an embodiment of adatabase user key table, a file backups or storage vector table, and apassword Hash storage table.

FIG. 6 is a diagrammatic illustration showing an embodiment of anexemplary use scenario and associated operation.

FIG. 7 is a diagrammatic illustration showing an embodiment of anexemplary method for dynamic data or information dispersal and controloperation.

FIG. 8 is a diagrammatic illustration showing an embodiment of a methodfor retrieving previously stored user data from the inventive system andbackup storage service.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS OF THE INVENTION

Various aspects, features, and embodiments of the invention are nowdescribed relative to the figures.

Contemporary computers sold today usually include at least an 80 GB harddisk drive even for low-end notebook computers, and frequently 300 GB ormore for mid-range desktop computers. Many users will not consume all ofthe hard disk drive space on their computer in normal use, especiallyfor the higher capacity hard disk drives. Furthermore, additionalinternal hard disc drives, external hard disc drives, and various formsof network attached storage are increasingly available.

Therefore, although users may be generating more digital content thanthe couple of gigabytes of content that might routinely be provided bythe free conventional online storage and backup providers, they maytypically still have a lot of available computer or other informationappliance based storage space in their home. This is evidenced by thefact that Apple, Microsoft, and others are beginning to leverage theavailable storage space on the user's own hard disk drive to performon-machine backup of the user's data. These leveraged storage techniquesby manufacturers and vendors may or may not show the user the automatedon-machine back-up so that the user may not actually be aware of how theback-up is occurring or the amount of space available for the user's ownuse.

In one non-limiting embodiment of the system described here relative toFIG. 1, the system 100 includes Server 102 based management block ormanager 104 which may include a database 106, a dynamic informationdispersal and control block 108, a node reliability monitor block 110, aprocessor 112 and processor associated random access memory 114, andoptionally a separate management storage such as a local hard disc driveor other storage device 116. It may be appreciated that in onenon-limiting embodiment of the invention, the database 106 may be storedon either the manager storage device 116 or on one or more storagedevice 120. For example, the database 106 may be redundantly stored inthe server in a RAID array to provide redundant and recoverable storageof the database. It may also be appreciated that the Server 102 in thisembodiment as well as in other embodiments of the invention, individualfunctional blocks may actually be split into an n-tiered webarchitecture across multiple individual servers to achieve faulttolerance and/or load balancing without deviating from the currentinvention. The management block 104 may also include a networkcommunications interface 103 and communications block 105 to enablecoupling of the server 102 to the network 124, which may be theInternet, a corporate or other intranet, or any other network ofcomputers, information appliances, storage devices, media generatingdevices, or the other devices or subsystems that may store or generatedata or information, including but not limited to files, sets of files,documents, or other multi-media, pictures, video, or other content.

Database 106 may be defined in non-volatile memory of storage device orsubsystem 120 and stores user information, encryption keys when data isstored in an encrypted manner, information identifying peer storagenodes, historical monitoring information indicating availability andreliability of peer nodes 130-N (and optionally other nodes that may bepresent on the network but that do not presently store data and are notassociated with registered users (or their surrogates)), file backupinformation identifying details of where particular user data orportions thereof are stored amongst the peer nodes, and optionalinformation pertaining to one or more of folder hierarchy and metadatafor files of a given user, any relevant data objects for a file that areoften used for presentation of the file, user share information ifsharing is implemented, and/or user tag related information. Alternativeembodiments of the database 106 including an embodiment of a database206 are described relative to the embodiments of FIG. 2 through FIG. 5and in Table 1.

Database 106, 206 may support various queries. By way of example, butnot limitation, examples of high-level queries may include: browsing thedevices (nodes) a user owns, browsing the roots of a given node,browsing the contents of a folder, finding all files that match a tag orother identifier, displaying all or some subset of the tags for a file,displaying all or some subset of the shares a user has permission to,and display the other users invited to view a particular users shares.These are merely examples of the queries that may be made to thedatabase. It may be appreciated that many types of database structuresare known in the art that provide a variety of data mining and queryoperations. These database features may readily be used with a databasestoring the items described herein elsewhere in this specification andare not described in further detail here.

In at least one embodiment, the dynamic information dispersal andcontrol block (DIDCB) 108 is provided within the manager and isresponsible for performing the information or data dispersal computationinitially and on a continuing basis for each user data dispersivelystored on the peer nodes. The DIDCB receives information either directlyfrom the node reliability monitor or from the database which may storehistorical reliability and availability information for current peernodes (as well as potential usable nodes) anywhere in the world. Asdescribe elsewhere herein, the DIDCB dynamically controls the particularnode storing each users data and the number of nodes that are used forthe storage. In one embodiment, the node reliability monitor sendssignals or pings to the nodes, and waits for a response, to determine ifparticular network nodes are currently on-line and available for access(such as for read and write access). In some non-limiting embodiments ofthe invention, the node reliability monitor may interact with a node byreading data from, writing data to, or both writing and reading data, soas to determine not only that the node is active and on-line but alsothat the storage device is responsive and to measure the effectivebandwidth of the node.

The management server may be configured so that it is capable ofbrokering the insertion (or upload) of data from any computer,information appliance, hand-held device, PDA, or terminal no matter howsmart or dumb or how thick or thin that terminal or device may be. In atleast one embodiment, an account and password may be established fromthat device or terminal and if it is too thin or dumb of a device, theactual processing may be performed by or brokered by the server, andlater the backed up data may be retrieved back to a home computer,business computer, or third party device that has sufficientcapabilities to receive and store the data.

In one embodiment, none of the actual user original data is stored in orpasses through the management server. When a user client device hassufficient processing capability to perform the mathematics of theinformation dispersal algorithm and any optional compression andencryption that may be desired or required (and on the restore sidesufficient processing capability to perform the decryption anddecompression) it is advantageous to leverage the processingcapabilities of that user client computer, information appliance, orother device; as well as the bandwidth capabilities of the peer nodes.Sufficient processing capability may for example include processor typeand speed and sufficient processor coupled random access memory. Thereare no absolute requirements as frequently the compression andencryption (and decryption and decompression) may be perform morequickly on a higher performance computer and less quickly on a lowerperformance computer. As it is the user's own computer, they willusually be accommodating to slower processing since they will be awarethat it is their computer that is the limitation and not the freeservice. Performing the encryption/decryption andcompression/decompression on the client device also has the advantagethat the smaller compressed size will save network bandwidth and theencryption of at least full size files or content will provide security.For content that is more often pictorial in nature (e.g. digitalphotography), thumbnails versions of the content may also be generatedon the user or source node and uploaded to the management server for usein later user presentation. All the communication between the managementserver and the end nodes is advantageously performed via HTTP over SSL(or using other security means) to ensure content protection between themanagement service and the client nodes. Such thumbnails mayalternatively be generated on the server by uploading the full images tothe server first, but this is disadvantageous at least because of theserver processing power and bandwidth consumed.

The management server controls which set of nodes to store a userscontent on, and to store the vectors and the keys for the user so thatthe user himself can repair or reconstruct his data in the event orfailure or other need. It may be appreciated that access is protected bythe primary key which is the user's password. Advantageously, as in allpassword access controlled systems, the user will store the passwordonly in their brain, not on their computer, and change it frequently.Biometric or other user authentication may also or alternatively be usedas well as temporal or second stage authentication systems.

The management server only stores the MD5 or SHA1 hash of the user'spassword (or other security or access identifier), so that in the eventthat the management server is compromised, the actual password cannot beobtained by others or compromised.

The management server may be configured so that it is capable ofbrokering the insertion (or upload) of data from any computer,information appliance, hand-held device, PDA, or terminal no matter howsmart or dumb or how thick or thin that terminal or device may be. Thesystem may create an account and password from that device and performall of the processing on the server, and retrieve data back to a homecomputer, business computer, or third party device.

The database includes keys and storage vectors and some form of Hash ofthe password, such as for example MD5 or SHA1 Hash of password, it mayalso include features to support the file sharing and web accessfeatures, as well as other optional data. In one embodiment, theoptional data may include metadata about the backup file set, sharingpermissions, and other data to support various features.

In one embodiment the information dispersal computation provides acomputational layer below the dynamic monitoring control and dispersionor re-dispersion, may be defined by the Rabin algorithm or by variationson that algorithm as may be known in the art with the added component ofdynamic monitoring of the system and dynamic modification of the data orinformation dispersal on a continuing basis. In one embodiment, each ofthe storage nodes are queried periodically or according to some otherrule or policy to verify they are on-line and optionally to determine orverify other characteristics, such as but not limited to bandwidth,capacity, error states or status, and/or any other information that maybe useful in determining a reliability of suitability for continuedstorage of user data already on that node or for new data that may needto have a node assigned for it. This reliability may be determined andstored in the database as a score. In one embodiment, reliability isdetermined for each storage node by sending a ping signal to the nodewith a frequency of between about a few seconds and a few days, inanother embodiment the frequency may be between 10 seconds and one day(24 hours), in yet another embodiment, the frequency is between about 30seconds and one day, and in yet another embodiment the frequency isbetween about every 30 seconds and about every 6 hours. In still anotherembodiment, the frequency is between about 1 minute and about 4 hours.It may be appreciated that the frequency should be sufficient tomaintain the reliability of the storage and that no particular setschedule may be required and that the schedule may be different fordifferent nodes and/or for different parts of the network wherehistorical reliability has been particularly high or low so that lessfrequent or more frequent monitoring may be advantageous. In any event,the nodes are monitored and the dispersment dynamically modified duringthe storage life of the user data.

In another embodiment other known information dispersal algorithms,procedures, or routines may be utilized. Although the informationdispersal algorithm identified by Rabin may be applied to the presentinvention, it should be appreciated either the Rabin informationdispersal algorithm or other IDAs as may be known in the art aremodified and/or applied in a different manner to the system and methodof the present invention. These differences are described elsewhere inthis specification, and the overall dynamic information dispersal andcontrol operation is set forth in the flow chart diagram of FIG. 7described in detail hereinafter.

Rabin describes an information dispersal algorithm (IDA) that breaks afile F of file size or length L=|F| into n pieces F₁, where 1≦i≦n. Eachof the n pieces F₁ being of length |F₁|,=L/m, so that every m piecessuffice for reconstructing the file F. The Rabin file dispersal andreconstruction algorithm are considered to be computationally efficient.It may be noted that the sum of the lengths |F₁| is (n/m)−L, and sincen/m can be chosen to be close to 1, the Rabin IDA is also considered tobe space efficient. The Rabin IDA may be applied to applications forsecure and reliable storage of information in computer networks and evenon single disks, to fault-tolerant and efficient transmission ofinformation in networks, and to communications between processors inparallel computers.

Rabin describes a procedure for splitting and later recombining thesplit files, which procedure is described below. It may be appreciatedin light of the description provided here that although the inventivesystem and method may utilize the Rabin algorithm and procedure assummarize below, the inventive system and method are not limited to thisinformation dispersal algorithm or to the particular procedure or methodfor splitting and recombining files or blocks of data. Rather the Rabinalgorithm and procedure is exemplary of one possible procedure.

Rabin considers a file F=b₁, b₂, . . . , b_(N), that is, a string ofcharacters, and assumes that one wants to disperse the file (or asapplied to the invention, some file, set of files, block of data, orother information set) F, either for storage or for transmission, underthe given condition that with overwhelming probability no more than kpieces will be lost through node storage or communication-path failures.

The characters b, in the string may be considered as integers taken froma certain range (for example, a range [0 . . . B]). For example, if theby are eight-bit bytes, then 0≦b₁≦255. The algorithm is not limited toany particular number of bits or bytes. If one takes a prime number B,where B<p, then for bytes, p=257 will suffice. It may be desirable tochoose a prime larger than the smallest B<p that will suffice. With pchosen such that p=257 there is an excess of one bit per byte. The RabinIDA may be implemented in fields GF(2^(ss)), where s=8 for bytes,without any excess. In mathematical terms, F is a string of residuesmodular or mod p, that is, a string of elements in the finite fieldZ_(p), and the following computations described further in Rabin are inZ_(p), that is, mod p.

First, choose an appropriate integer m so that n=m+k satisfies n/m≦1+εfor a specified ε>0. Choose n vectors a_(i)=(a_(i1), . . . ,a_(im))εZ^(m) _(p), 1≦i≦n, such that every subset of m different vectorsare linearly independent. Alternatively, it suffices to assume that withhigh probability, a randomly chosen subset of m vectors in {a₁, . . . ,a_(n)} is linearly independent. The Rabin paper shows how to satisfyeach of these conditions.

The file F is segmented into sequences of length m. Thus, file or blockof data F is given by the expression:

F=(b ₁ , . . . , b _(m)), (b _(m+1) , . . . , b _(2m)),

Now, denote S₁ by the expression S₁=(b₁, . . . , b_(m)), etc. Then, fori=1, . . . , n,

F ₁=c_(i1), c_(i2), . . . , c_(iN/m),

Where, c_(ik)=a_(i)·S_(k)=a_(i1)·b_((k−1)m+1)+ . . . +a_(im)·b_(km)

It follows that the magnitude of F₁ is given by |F₁|=|F|/m. Therefore,if m pieces of file or data block F, for example m pieces F₁, . . . ,F_(m) are given, one may reconstruct the file or block of data F asfollows. Let A=(a_(ij))_(1≦ij≦m) be the m×m matrix whose ith row isa_(i). Rabin shows that:

${A \cdot \begin{pmatrix}b_{1} \\\vdots \\b_{m}\end{pmatrix}} = \begin{pmatrix}c_{i\; 1} \\\vdots \\c_{m\; 1}\end{pmatrix}$ ${{and}\mspace{14mu} {{hence}\begin{pmatrix}b_{1} \\\vdots \\b_{m}\end{pmatrix}}} = {A^{- 1} \cdot \begin{pmatrix}c_{i\; 1} \\\vdots \\c_{m\; 1}\end{pmatrix}}$

Next, denote the ith row of A⁻¹ by (a_(i1), . . . , a_(im)), then ingeneral, for 1≦k≦N/m, the following expression holds:

b _(j) =a _(i1) c _(ik) + . . . +a _(im) C _(mk), 1≦j≦N,

where i=j mod m, k=[j/m] (here we take the residues to be 1, . . . , m).

Thus one may invert matrix A once and for all, and reconstruct file ordata block F by the above expression, which involves 2m mod p-operationsper character of file or block F. Rabin demonstrates that forsufficiently large files satisfying m²≦|F|, the operation cost ofcomputing A⁻¹ is majorized by the cost of reconstructing F by the aboveexpression for b_(j), even if one uses m³ operations for computing A⁻¹.Rabin shows that one can choose a₁, . . . , a_(n) so that thecomputation of any A⁻¹ will require just order of m² or O(m²)operations.

Since both splitting up the file by the expressionc_(ik)=a₁·S_(k)=a_(i1)·b_((k−1)m+1)+ . . . +a_(im)·b_(km), andreconstruction by b_(j)=a_(i1)c_(ik)+ . . . +a_(im)C_(mk), involve justinner products, so that the method is readily adaptable to vectorized,systolic, or parallel architectures.

Rabin also observes that it is possible to use other fields instead ofZ_(p). For example, for 8-bit bytes one can directly use the fieldE=GF(2⁸) of characteristic 2 and having 256 elements. All one needs isan irreducible polynomial p(x)εZ₂[x] of degree 8 to allow us toeffectively compute in field E.

Therefore, one may use the vector equations described in Rabin. Theindividual vectors that are generated are stored in the database 106,206 and used during retrieval, restoration, and/or redispersing steps.Existing storage vectors may be utilized for retrieval or restorationwhen still current, or the storage vectors may be recomputed accordingto the dynamically determined dispersal or redispersal strategy that maybe needed at the time.

Again, It may be appreciated in light of the description provided herethat the Rabin algorithm and procedure, as well as alternativeprocedures as may be known in the art, may be used as a component forsplitting and recombining of files or blocks of data in the inventivedynamic data or information dispersal and maintenance procedure.

Returning to the description of the exemplary system, processor 112 andprocessor associated random access memory 114 may be conventional singlecore or multiple core processors or microprocessors and on chip or offchip random access memory as are know in the computing arts. Theprocessor may also include or be coupled to special purpose logic orco-processors that may implement particular compression, decompression,encryption, and/or decryption in hardware or as a combination ofhardware and firmware.

Manager storage may be any storage device such as a hard disc drive or aplurality of physical or logical drives, and may be used by theprocessor 112 for the nonvolatile storage of computer program code,operating system elements, data, temporary storage or virtual memory,and for other storage as is known in the computer arts. In oneembodiment, manager storage is used to store a server applicationprogram 111 for controlling the overall operation of the server 102 andfor interacting with the local client program, web based programs, orother generic or specialized interfaces presented by the peer nodes andoptionally with other network 124 elements.

The management block 104 may optionally include either or both of datacompression/decompression block(s) or module(s) 118 a, 118 b, and dataencryption/decryption block(s) or module(s) 119 a, 119 b. Alternativeembodiments of the system manager block that include one or more of datacompression/decompression module(s) 118, and a dataencryption/decryption module(s) 119 are described with reference to thesystem configuration and architecture of FIG. 3 hereinafter. It may beappreciated that while both data compression and data encryption aredesirable and have many advantages, they are not required as part of theinvention. Data decryption and data decompression are only required toretrieve or recover data that has earlier been encrypted and/orcompressed, so that these processing components are also optional. Itshould also be appreciated that any compression, decompression,encryption, and decryption may be performed within the server 102, byprocessing entities coupled to the server 102, by any one of the nodes130-1, 130-2, . . . , 130-N, or by another processing entity to whichthe compression, decompression, encryption, and/or decryption may beoutsourced to. At the same time, there are described particularimplementation and operational strategies that may favor performing aparticular one or a combination of these operations within the dataowner's node 130-1, by the server 102, or in somewhat more limitedsituation by a node different from the server or the data owner's node(e.g., web access described hereinafter relative to FIG. 3)

The system may also include a plurality N of other network nodes 130(e.g. 130-1, 130-2, 130-3, . . . , 103-N), where the number of nodes maybe variable over time as nodes are added to or removed from the network.As will be described in greater detail herein, the nature of each nodemay vary depending upon its primary responsibility (if any) in thenetwork. At least some plurality of the nodes 130, but not necessarilyevery one of the N nodes, must support a data storage function so that adata dispersal aspect of the invention may be implemented, wherein afirst user's data in backed up and stored onto a storage located at orassociated with at least two (or some other plurality) other differentnodes. Some of the nodes that do not have appropriate storage, mayparticipate in network activity but not act as storage nodes for otherusers.

In one embodiment, the network nodes comprise personal computers havinga processor and memory coupled to the processor, as well as input/outputdevices (such as a keyboard, mouse, and display screen), a networkinterface (such as a NIC card or circuit) and optional peripheraldevices. Those nodes acting as, or capable of acting as storage nodesfor other user data will also include a mass storage device, such as oneor more rotating magnetic or optical media disc drive. Frequently, thestorage device will be a hard disc drive with sufficient free space toserve as storage not only for the owner of that devices data, but alsofor the dispersed storage of one or more other users data as will bedescribed in greater detail herein.

Embodiments of the invention are adapted for storage of digital contentof all kinds, including data that was originally in an analog electronicor signal form or for paper documents that have been converted to adigital form. References to data will include any one or combination ofdigital or computer files, file folders, file folders and the contentsthereof, multi-media content, videos, pictures, images, music files, andany other form of digital or computer readable or storable information.

In the non-limiting embodiment of the system in FIG. 1, the exemplaryuser node 130-1 includes a processor 132 and random access memory (RAM)133 coupled to the processor, at least one local non-volatile storagedevice 134 usually in the form of a hard disc drive that is coupled withthe processor and RAM memory over a high speed internal bus. Additionalstorage devices may be present such as external hard disc drives coupledby SCSI, USB, Firewire, eSATA, or any other know or to be designedinterface. The user node 130-1 also includes means for connecting to orcoupling with the network 124. The means for connecting may be orinclude a wired or a wireless connection, and utilize any know networkinterface card (NIC), 802.11-based wireless connectivity, broadband orsatellite connection, or the like. Conventional components of personalcomputers are not shown or described to avoid obscuring inventiveaspects of the computing device, system, and method.

In at least one embodiment of the invention, the user node 130-1includes a specialized local client application program 135, althoughembodiments will be described that do not include such specialize localclient will be subsequently described. In one non-limiting embodiment,the local client application is in the form of an applet, Java plug-in,program code the includes an active-X component, or other programstructures that provide or will provide analogous operational featuresin the future.

In one embodiment, the local client application program 135 includes apeer node 130-2, . . . , 130-N and management server 102 communicationsmodule 136 and a local file management system for backups 137. These twomodules either alone or in combination execute in the user node computerprocessor and memory to facilitate operation of the node 130-1 relativeto the server 102 and the other peer nodes 130-2, . . . , 130-N. Asdescribed herein elsewhere, the local client application (or moresimply, “client”) may be downloaded from server 102 (or from any othersource) during user registration with the backup service. Afterdownloading, the client participates in the interaction between the userand the user's computer and the server. The client 135, and particularlythe communications module 136, may also participate in communicationsbetween the user peer node 130-1, and other of the peer nodes 130-2, . .. , 130-N, such as for example when sending backup data segments afterprocessing to other of the peer nodes through client networkcommunications interface 131.

The file management module 137 may operate within the local usercomputer to assist in identifying files that may need to be backed up,and may include features such as file or folder searches and otheradministrative tasks that will facilitate informing the user of acurrent backup status, to identify backed up files, to identify newfiles that have not been backed up, at the like operations. The filemanagement module may interact with existing elements of the operatingsystem, utility programs, application programs or the like toefficiently identify new files that are brought into the file system ofthe computer either by local generation, downloading from an externalsource, or in any other way.

The local client 135 may optionally include either or both of a datacompression/decompression module 138, and a data encryption/decryptionmodule 139. Alternative embodiments of the system that include one ormore of data compression/decompression module 138, and a dataencryption/decryption module 139 are described with reference to FIG. 3hereinafter.

With further reference to the embodiment of FIG. 1, there are shownseveral exemplary user or peer nodes 130-1, 130-2, 130-3, and 130-N.Characteristics of a typical exemplary peer node 130-1 have beendescribed and it is anticipated that many or most peer nodes may bepersonal computers or PCs. On the other hand the invention is notlimited to personal computers as peed nodes. For example, peer nodes mayinclude thin client devices that have relatively low level processors,little memory, and perhaps no real mass storage device to storing otherdifferent user data. One example of such a peer node device is a digitalstill or video camera 381 that has some internal memory for operation ofthe camera, possibly an on-board or externally available networkconnectivity (such as a Wi-Fi, 802.11x, Bluetooth, or otherconnectivity) and a memory card 382 for storing the still or videoimages. In one application of embodiments of the invention, a user whiletraveling away from their home or business computer, may want to be ableto upload and store new images either as a backup of internal memory ofthe camera as a kind of insurance, or because the user is not carryingsufficient storage capacity with him/her (such as in the form ofmultiple memory cards, compact flash (CF) RAM cards, CD or DVD recordingdevice, hard disc drive storage, or a personal computer on which copiesof the memory cards from the camera can be transferred). The user maytherefore create a backup from the camera to the service just as he/shewould from any other peer node, with the proviso that in this instance,due to the limited processing power of the camera, the server may morelikely be tasked to process the image files. This server basedprocessing may include the compression, encryption, and dispersion ofthe user data. In this operational scenario, the user would uponreturning home (or to any other location) to their computer, may uponproviding appropriate authentication, retrieve and restore all of theimage files that were earlier uploaded to the system and service.

Even though the backup was generated in multiple upload sessions, theuser may retrieve and recreate the desired file system on theircomputer. A tool or wizard may optionally be made available on theclient to assist in creating a file, folder, or other directorystructure for the files. In one embodiment, the camera or other devicemay be manufactured to include a special purpose processor or logic toperform a hardware and/or hardware-firmware version of compression,encryption, and/or data dispersal so that the camera or other thinclient device may perform more of the processing so as to alleviate atleast some of and perhaps all of the processing that the server 102might otherwise need to perform. It might be expected that thiscapability and service may also be useful to news organizations,photojournalists, and travelers, as well as to typical consumers. Whenproviding this service to commercial organizations, a fee-based businessmodel may be employed wherein commercial users are charged for datavolume, storage capacity used, bandwidth, CPU cycles, or any othermetric that would represent the cost of providing the service plusprofit. This fee-based model may be more appropriate as the revenuesfrom advertising may not permit recoupment of costs and the volume ofstorage may be relatively large.

Another possible peer node device may be a storage device with networkconnectivity. Such a peer node may not itself have a typical personalcomputer interface but would represent some network attached storage.One type of device that incorporates Network Attached Storage (NAS)capability and may be directly attached to network without anintervening computer including but not limited to products like SharedStorage II™ made by Maxtor, Inc. of Milpitas, Calif., USA (See websiteat www.maxtor.com). Other primarily storage device type peer nodes maybe used.

It may be appreciated that peer-to-peer connectivity or networkingtechnologies have been leveraged over the past several years as a methodof distributing content from a single content provider to multipleconsumers of the provided content. This peer-to-peer connectivity ornetworking content delivery model and method have been used in verysuccessful, although often questionably legal, deployments. Thepeer-to-peer connectivity or networking content delivery technologiesbehind these solutions enable consumer devices coupled or coupleable onthe Internet (or over any other network) to communicate with each otherwithout a host or hosted service brokering the communication or thebandwidth. For the content distribution world, this non-hostedpeer-to-peer connectivity can save significant bandwidth costs for thecontent service provider.

While a non-hosted peer-to-peer communication and content delivery ineither direction (upload to or download from) may be preferred,embodiments of the invention described herein are not limited tonon-hosted or to peer-to-peer configurations or operation.

Known conventional systems and methods differ from the present inventionin at least two fundamental ways. Conventional systems and methodsoperate in a manner where there is no centrally managed service thatknows about or manages the storage or peer nodes in the network. In fileswapping networks of questionable legality, the lack of a manager isideal as it makes it difficult for any authority to monitor the networkor impose any liability. In backup systems and methods it createdproblems. In such file backup systems and methods, peer groups of usersmay at least somewhat effectively store and obfuscate the data toprovide some measure of security using Information Dispersal Algorithms(IDA) or procedures. Some conventional IDA algorithms and procedures andbackup schemes have been described in the background. In theseconventional implementations, the vectors used to store and retrieve thebacked up data are stored only on the originating host, such as the usercomputer. As such, if the originating host is destroyed or lost, thebacked up data cannot be reconstituted anyway, resulting in an inabilityto retrieve or restore the backed up data. Therefore, an unmanagedconventional peer-to-peer backup system and method may not achieve theperformance and capabilities desired.

By way of comparison, the system and method of the present invention,stores vectors in the managed service in a management service storage ordatabase which allows for a retrieval or restoration to worksuccessfully even without a user needing to store the vectors anywhereelse than in the management service storage.

The inventive system and method also provide management service over thepeer group that may have an increased reliability and redundancy factor.Embodiments of the invention provide means by which the reliabilityand/or redundancy may be dynamically tuned or optimized. As described ingreater detail elsewhere in this specification, one possible componentfor increasing reliability and redundancy is to monitor and maintain ahistory of the peer nodes in the network that reflects on their on-lineor on-network availability or uptime as well as measures of theirreliability. The manager may then recommend and control which selectedpeers of the available peers to store data to both when new data entersthe network, and for data that is already on the network but perhaps notin the most reliable or available peer node storage assets. This allowsfor historical uptime and reliability to indicate future reliability forthis storage node in the network.

The management service may also use the availability or uptime historyas well as the reliability information in order to change or shuffle thepeer node storage locations or devices to other peer node storagelocations or devices in the network if and when the reliability of anoriginal storage node is no longer valid. This shuffling may beaccomplished before a peer node storage device becomes unavailable orafter the peer node storage becomes unavailable, as only a subset of thepeer node storage devices are required to recreate, retrieve, or recoverthe backed up data. The inventive system and method leverage the sameIDA procedure that was used to build the file from portions of thestorage nodes to constantly rebuild and recreate storage nodes when someof the nodes disappear. This allows for the whole data set to becompletely managed and moved around even after the data was originallystored on or inserted into the network.

With reference to FIG. 2, there is illustrated an exemplary system 201according to a non-limiting embodiment of the invention. This embodimentis similar to the embodiment of FIG. 1 and adds additional detailsrelative to an exemplary management database 106. Table 1 provides asummary of elements in an exemplary embodiment of the database 206 whichis a particular embodiment of database 106. It will be apparent to thoseskilled in the art in light of the description provided here that thesame or analogous operation may be achieved by a database having adifferent organization, different fields or records, or the like so longas any required information is available for management and control ofthe server and its components, of the peer nodes, of the communications,and of the backed up data itself.

References to tables may mean actual tables, lists, data structures, orother database element that is capable of storing records, parameters,numbers, vectors, scalars, of other values in the manner of a table orplurality of tables.

This particular embodiment of the database 206 may include a Users Table210 that is one of the primary user tables and contains an entry foreach user in the system and service. In one embodiment, the table maystore a single row per each unique user in the system. Nodes Table 211that may be used to manage the storage nodes known, where the storagenodes may be local or anywhere in the world, and where storage nodes maybe owned by users or may be provided by non-user entities. A Node PingsTable 212 may store historical ping information from storage nodes inthe world which may have been obtained by the node reliability monitor.A Files Table 213 may store a folder hierarchy and optionally metadatafor files of a given user. An optional File Objects Table 214 may storeany relevant data objects for a file that are often used forpresentation of the file; examples of this include thumbnails for imageor video files or other documents or “snippets” of textual files. A FileBackups Table 215 may store the backup vector details for each file thatwas backed up. An optional Shares Table 216 may store a user's definedshares when sharing is optionally provided. An optional Share UsersTable 217 may store the mapping from shares to users allowed to view itwhen this feature is optionally implemented. An optional Tags Table 218may store the user's defined tags, and an optional Tag Files Table 219may store the user's mapping between tags and files. A password hashtable for storing a hash of a user password may be provided as aseparate table or included within one of the afore described tables. Itwill also be appreciated that the database may be differentlyconstructed so long as it includes any required information or data andthat more or fewer tables may be utilized to accomplish this. Thereforeit will be appreciated that although the exemplary database as describedhas advantageous features, the invention is not limited by anyparticular database structure or organization.

User Keys Table 220 may store the encryption keys used to validate theusers and nodes. Different numbers of keys may be utilized depending onthe actual implementation. For example, consider a situation where thereare a number “u” of users. Each user may have zero (0) or more installedclients referred to as nodes. The total set of nodes is a number “n”.Each user may have one or more keys “k”. Each of the nodes may store oneor more backup sets, where each backup set may contain or include a treeof files and directories, or some other data structure. For each file,metadata for the file may be stored in the management service, such asfor example in a management service server resident or coupled database.For each file, a number “m” of the other n-1 nodes are chosen todisperse the data to. For each of the m dispersals, one record exists inthe storage information table in the management service. It may beappreciated that when dynamic management is advantageously implemented,the value of m can be tuned and optimized and the nodes of n in the mstorage information tables can change as the management service choosesto facilitate improved availability, reliability, and appropriateredundancy.

With reference to File Backups Table 215, which describes storagevectors as used in conjunction with the information dispersal algorithmand procedure, it may be appreciated that while embodiments of theinvention may provide for the storage of the individual storage vectorsrequired to rebuild the user data to be stored at different locations,even possibly including at one or more user location, advantageously andpreferably the individual storage vectors or other means for identifyinginformation for reconstructing or rebuilding a particular user backupcontent are stored only on the inventive service. This allows for theadded security of not storing the individual storage vectors on the peernodes as well as not losing the storage vectors if and when the originaluser content source device fails (which is the whole reason for backingup in the first place). Where storage of the individual storage vectorsor other content rebuilding or recreation information is provided by astorage device on the management service, such as on a content serviceprovider management server, the storage may be made in such a mannerthat such storage is redundant to any needed degree. The redundantstorage of the storage vectors, may for example, be provided bymirroring, any applicable RAID type redundancy, the maintenance ofmultiple separate storage devices at different physical locations, or asotherwise known in the art. Storage vectors are further describedrelative to the information dispersal algorithm and procedure as well asan exemplary database and database tables.

TABLE 1 Summary of Database Tables In An Embodiment Of The Invention(Including Optional Tables Not Required In All Embodiments) Table NameTable Function or Storage Users Table Primary user table contains asingle row per each unique user in the system. User Keys Table Store theencryption keys used to validate the users and nodes Nodes Table Managesthe storage nodes. Storage nodes may be local or anywhere in the world.Storage nodes may be owned by users or may be provided by non-userentities. Node Pings Table Historical monitored ping information fromstorage nodes in the world. Files Table Stores the folder hierarchy andmetadata for files of a given user. File Objects Table Stores anyrelevant data objects for a file that are often used for presentation ofthe file. Examples of this include thumbnails for image or video filesor “snippets” of textual files. File Backups Table Stores the backupvector details for each file that was backed up. Shares Table Stores theuser's defined shares. Share Users Table Stores the mapping from sharesto users allowed to view it. Tags Table Stores the user's defined tags.Tag Files Table Stores the user's mapping between tags and files.

While a user may usually have at least one user node, a user may havezero nodes or a user may have a plurality of nodes where the pluralityof nodes may be N nodes where N is an integer value. The number of keysmay be different from the number of nodes, and that may be differentfrom the number of users in general. In one embodiment, there is asingle key per user, but there is no requirement for a single key peruser. In one embodiment, there must be at least one key per user who issubmitting or uploading encrypted data to the network. There is norequirement for a key or for a separate key for a user who may beretrieving data from the network, such as an invited guest who isinvited to share data or content of a registered user. The invited useror member of a share group would use the key of the registered user whoformed the share group and a password that is chosen by them orauto-generated by the system if they are a newly invited user. Inanother embodiment, the share groups have different keys. In anotherembodiment, each file has it's own key and the different members of theshare group have different keys used to decrypt an encrypted copy of thefile key.

Storage vectors comprise server data base entries that tell the systemwhere the dispersed pieces or segments of a user's files, folders, data,and/or content are stored and which IDA vectors were used for thatsegment. Embodiments of the system and service provide for dynamicredispersement of a users data so that the storage vectors may usuallychange over time. For example if segments of the registered users dataare stored on five different storage nodes, there will be five vectorsassociated with that user's data in the database. If the user data isredispersed to six storage nodes, then both the number of nodes and theidentity of the nodes will change. Alternative strategies may beutilized to identify storage locations so that there may be more orfewer vectors than the number of storage nodes. Alternatively, otherdatabase or identification means may be used to identify the storagenodes. In one embodiment, a single database vector may be used toidentify all of the storage nodes associated with a particular user'sdata. In terms of the number of nodes involved in storing a users data,there can be a tradeoff on storage space for flexibility andscalability. Advantageously, a plurality of storage nodes identified bya plurality of storage vectors include at least some storage nodes atgeographically different or diverse locations.

In one embodiment, each storage node entity that gets put into thesystem whether at the file level, a set of files level, a block level,or at an overall system or block level needs a storage vector per eachof the devices where that data will be stored in. Therefore by way ofexample, if a file is going to be broken up into 16 pieces or segmentsfor storage in 16 different storage entities, one would need 16 vectorsfor that file. These same vectors may be used to both break up thestorage entity prior to dispersal and for recombination as a reversedispersal during retrieval or restoration.

Technically this can be applied at the entire user level so that eachuser has for example only 16 vectors. It may also be applied at the nodelevel so that each nodes that the user has includes 16 vectors. Inanother embodiment, it may be applied at the file level so that each ofthe user's files has 16 vectors. Storage may be managed to convert thestorage between and among different levels, such as from a user level,block level, file level or the like to a different storage level. Thismay be accomplished by performing a partial or total reverse IDA torecover at least the data that is desired to be dispersed at a differentlevel, and then performing the redispersal using the forward IDA.

Dispersal at the node level provides that each user has one or morenodes from which they are going to submit data to the network. Dispersalat the file level provides that each file is to be separately broken upor segmented by the IDA and recovered by a corresponding reverse IDA.Dispersal may also or alternatively be accomplished at a different levelwherein a block or data or a set of files is processed by the IDA.Embodiments of the invention may provide for any arbitrary set ofcollection of data to be dispersed together by the IDA and then reverseIDA to retrieve or recover that data.

In one embodiment, each vector is or permits the identification of asingle storage node and a single portion of the user data. Thecollection or set of vectors (for example the 16 vectors in the aboveexample) specifies all of the locations where a user data is stored andit is some tunable subset (including the full set) of these vectors thatpermits complete retrieval and restoration of the totality of the usersbacked up data. Less than this subset will garner none of the user'sdata and thus this defined subset of backed up nodes must all becompromised in order to retrieve any of the user's data. In the optionof the additional encryption step before dispersal, the key used forencryption must also be compromised from the management service beforethe original data can be retrieved.

In one embodiment, a complete storage reference of a file or data sethas a plurality of mathematical vectors that are adapted fortransforming the data to be dispersed and stored in accordance with theinformation dispersal algorithm. A vector is a sequence or set ofnumbers and is what one performs the mathematical operations on the datathat is going to be stored on the storage nodes. Recovery of theoriginal back up is achieved by a process that is somewhat the inverseor reverse of the process used to store and disperse the original data,which the exception that only Z of the M nodes need to be accessed whereZ<M. The inverse of using Z of the vectors, which is some subset lessthan the total number of nodes M on which storage occurred (e.g., Z<M),is taking the Z vectors out of a the total set M vectors for the pstorage set, and doing an inverse matrix transformation and applyingthat back from the data obtained back from the Z nodes. This permitscomplete rebuilding of the data set.

Storage vector references are entries in the data base that indicate fora given users file or block of data (depending on the file, node, user,network, or other level) where that file is broken up and stored amongstthe storage nodes of the network. The database and the entries ofstorage references in the database are managed by the management server,and may usually change over time as the storage peer node reliabilitymay change over time and user data is moved from less reliable nodes tomore reliable nodes.

Having now described a particular embodiment of a management databasethat may be used with the invention relative to the embodiment in FIG.2, attention is now directed to a similar embodiment of FIG. 3 that addscertain optional compression and decompression, optional encryption anddecryption. FIG. 4 is a diagrammatic illustration that separately showsan embodiment of the inventive database 106. FIG. 5 shows additionaldetail for the user key table, the file backups or storage vector table,and the password Hash storage table illustrating the multiple entriestypical of these tables.

With further reference to the embodiment in FIG. 3, the optionalcompression and decompression, and/or optional encryption and decryptionmay be provided either in the server 102 or in the peer node (such as inpeer node 130-1). In the event that the optional compression anddecompression and/or optional encryption and decryption is present onlyon the server 102, then where these processing features are desired theymay be performed by the server. Where the optional compression anddecompression, and/or optional encryption and decryption are present onthe peer node client 130-1, they are advantageously performed there tounburden the server processor and to leverage the processingcapabilities of the peer nodes. The optional compression anddecompression, and/or optional encryption and decryption mayalternatively be outsourced to another processor by the server or by thepeer node client. The system in FIG. 3 therefore identifies an optionalserver resident data compression functional block 351, an optionalserver resident data decompression functional block 352, an optionalserver resident data encryption functional block 353, and an optionalserver resident data decryption functional block 354. The system of FIG.3 also identifies an optional peer node client resident data compressionfunctional block 361, an optional peer node client resident datadecompression functional block 362, an optional peer node clientresident data encryption functional block 363, and an optional peer nodeclient resident data decryption functional block 364. It may beunderstood that these compression, decompression, encryption, anddecryption may be performed by the server or by the peer node or nodesin any combination.

Rationale for including the optional compression, decompression,encryption, and decryption are now described. Also described are somerationale for performing these optional operations at a particularlocation in the system.

Optionally, but advantageously, embodiments of the invention mayincrease storage efficiency and capacity, of the peer network, bypassing user data or content through a compression algorithm beforebreaking it into pieces. This compression may be performed either on theuser node peer side for whom the backup is being performed (upload orinsertion side), at the receiver storage node side (download), by themanagement server, or at some intermediary anywhere between the sourcepeer node and the destination peer node or nodes, such as for example byan optional server. Depending on the actual files, folders or set offiles, or other content the particular user chooses to backup in thestorage network, this compression may usually assist in minimizing theoverhead of the reliability choices made by the inventive Managementservice, since the overhead for the choice of the information dispersalalgorithm nodes is inversely proportional to extra space used.

It may also be appreciated that if the compression is performed at thecontent source node, then the bandwidth required over the peer-to-peeror peer-to-server-to-peer connection will be reduced. The determinationas to where the compression is best accomplished may be based on a userselection, an automatic selection by client software or algorithms orprocedures in the source node device, by the manager, or by other meansin the system. Compression is also optional, but particularly when thecompression is lossless, the advantages of compression including areduction of storage space volume on the storage nodes, 28- and thereduction in bandwidth for communicating the data over the networkconnections, are clearly present and should advantageously beimplemented in a practical system.

In a further optional but advantageous enhancement to the inventivesystem and method, and to advantageously increase the security of theoriginal user data, an additional data encryption maybe performed on theuser data. Preferably, the encryption is performed on the compresseduser data. In one embodiment, the encryption may be a key basedencryption although other encryption schemes as are known in the art mayalternatively be utilized. In one embodiment, the encryption scheme maybe a symmetric AES encryption scheme, in which an AES encryption pass orprocesses is applied to the compressed user data before calling orperforming the inventive information dispersal algorithm. The AESencryption scheme is a key-based scheme, and the key for the AESencryption pass may also advantageously be stored in a storage of theinventive system and service in order to increase security andreliability. The key itself may optionally but additionally andadvantageously protected by the user's password to avoid and circumventpotential data or content attacks against the inventive system andservice itself. In one embodiment, AES encryption and/or decryption maybe performed on the server 102 to provide faster and more efficientencryption/decryption and to offload processor 112 from these tasks.

It may be appreciated that the encryption is optional and need not beprovided by the system at all if privacy or security beyond that provideby the IDA itself is not required or desired. In practical terms,however, implementing a backup system wherein one users data, files,folders, or other content are stored even in pieces on another user'scomputer or storage device, without some form of security or encryptionis disadvantageous from a business perspective.

During rebuilding or reconstruction of a user's content, the steps ofdecrypting and decompressing are essentially reversed. In one nonlimiting embodiment, encryption and decryption are key based and the keyor keys are stored in the management server database.

The rebuilding or reconstruction will also require the decompression ofthe decrypted user data or content. When a registered user desires toretrieve, recover, or rebuild all or a part of a data set that has beenbacked up to the service from a computer with which the user registered,the user may make a request for such retrieval through the clientprogram resident on his computer. The request is communicated to themanagement service which stores the storage vectors in its databaseidentifying all of the storage nodes where segments of the particularuser's data is stored. Because the data is redundantly stored on morestorage nodes than are required, the service manage who stores thecurrent set of storage vectors that may have been dynamically modifiedsince the original upload or insertion, may identify a subset ofcurrently available and reliable nodes and direct the communication ortransmittal of the plurality of portions to the requesting user client.Alternatively, the client may receive instructions from the managementservice and directly request the subset of the previously dispersedsegments. The client computer may then perform what may be considered aninverse or reverse information dispersal algorithm (RIDA) using thesubset of non-redundant segments (or a greater number or even all of thesegments if some additional error checking or error correction mightoccur by such use). The original data set has thus been recovered andrestored to the owner's computer and the backup and restore operation issuccessful. Typically, the user will wish to maintain the dispersedbackup so that a future retrieval or restoration is possible. In atleast one embodiment, the user is given an option to delete his backupdata set at any time. Although, this is disadvantageous to the user,some users may prefer to have this option for privacy reasons. Themanagement service may then direct the deletion of the dispersedsegments identified to the user, either by actual deletion andoverwriting or by deletion from the directory structure so that theycannot be located or accessed, and so that the storage space mayultimately be utilized for other storage.

In an alternative embodiment, where access by a registered user having abackup on the service, is made from a computer or information appliancethat does not have the service client installed, then either the clientmay be downloaded and installed such as for a new user, or the retrievaland restoration may less advantageously be performed through a genericInternet or web interface. Various plug-ins and active-X may be requiredon the retrieving computer or information appliance device to facilitatethe retrieval and reconstruction or when required, the server may brokerthe retrieval or restoration to the computer or information appliancefrom which the validated (e.g. proper user ID and user Password) requestwas made. It may be appreciated that any of the IDA, RIDA, compression,decompression, encryption, and/or decryption may be performed on any ofthe nodes, management server, or outsourced to another entity coupled onthe network, but that certain processes and architectures are moreadvantageous than others either because of increases in computing power,security, communication link bandwidth, storage device bandwidth, orother factors.

When any of compression, decompression, encryption, and/or decryptionare provided in embodiments of the invention, they may be provided by orwithin any of the registered user client machine that owns the data, inthe management service server that is operating to control the service,in one of the storage nodes to which a portion of the registered usersdata is to be dispersively stored, or in some combination of these.

Advantageously, compression, encryption, and generation of the pluralityof backup segments occurs on the registered user's machine that isuploading the backup to the peer network. It is advantageous to performthese operations here because the uncompressed and unencrypted data ispresent on the upload client computer and performing these operations onthat computer advantageously uses the potentially otherwise unusedprocessing power of that computer. It also prevents placing anyunencrypted data on the network in a way that it might be intercepted,and reduces network bandwidth requirements. The upload client usercomputer may also advantageously generate the plurality of segments andcommunicate (independently or in coordination with commands from themanagement server) the segments to the plurality of storage nodes inaccordance with the information dispersal algorithm computation. Theinformation dispersal algorithm for any particular data set may beperformed either on the upload user computer side or by the managementservice server, but the most bandwidth efficient choice would be for theclient device to perform the algorithm and communicate directly with thepeer nodes for storage.

When the management service determines that one or more storage nodeshave become unreliable for whatever reason, all or portions of theuser's data may be redispered to a different set of storage nodes (wheresome of the nodes used may be the same and at least one will be adifferent node). The redistribution of the data does not require eitherthe decompression or decryption of the data. In one embodiment, theencrypted and compressed data is merely moved in tact from one storagenode or set of storage nodes to another storage node or set of storagenodes. In some embodiments, only the data earlier dispersed to what hasbecome an unreliable storage node will be moved to a more reliablestorage node. In one embodiment, if the unreliable storage node is stillavailable so that the data stored there can be accessed, then the storeddata set may be moved or copied to another reliable storage node. Inthis embodiment, the storage vectors in the management server databaseare updated with the new storage information. In the event that theparticular storage node cannot be accessed, then the data may beregenerated from the data stored redundantly on the other remainingstorage nodes. Alternatively, the system may reapply the informationdispersal algorithm and generate a new data dispersal strategy. Theregeneration of data or the redispersal of data from an unreliable nodemay depend on how the original data was processed and dispersed, and inparticular may depend on the level at which the data was processed.

It may be appreciated that the data may be dispersed at any one or moreof various hierarchical levels. In one embodiment, the dispersal may beperformed at the file level so that each file may be separately andindependently dispersed to a plurality of peer storage nodes.

In another embodiment, all files on the user computer that areidentified as new (an optionally those identified as having beenchanged) since the last backup may be processed together and dispersed.In another embodiment, the entire set of the users data are reprocessedand redispersed, but this later option is disadvantageous from thestandpoint or inefficient use of processing power and network bandwidth.Advantageously, for purpose of security no matter what level theinformation dispersal algorithm or procedure is executed at (for exampleat the individual file level, at the set of files level, at the data orfile block level, or at any other level), no entire file is ever storedon a single peer storage device. Even in a case where a file or block ofdata is a single byte, the single byte file or block would still bepadded to a factor of Z bytes and dispersed to M nodes. The result wouldbe M files of length Z where M and Z are as defined above. Thereforesecurity for files or blocks of data of any size is assured.

The manner of dispersal may be different for different portions orbackups of the user data. Particularly following the initialpost-registration or insertion of a large set of user data to theservice, a file set or block mode dispersal may be most appropriate.However, when additional files, folders, content, or other data aresubsequently added, it may be more efficient and advantageous todisperse the new data at the individual file level, or at file set levelcorresponding to only the new or changed data, and not redisperse all ofthe data on the user's computer each time there is a change.

When either the optional file or content sharing features or the webaccess features of embodiments of the invention are considered, backingup a user's data at the individual file level has some advantages,including an ability to retrieve any single file with less computationalburden and lower bandwidth requirements.

Independent of the level at which data dispersal is conducted, thedatabase on the management server stores information in the form ofstorage vectors, that inform the manager where all of the files,folders, content, or data are stored and enables the manager to performthe retrieval or reconstruction.

In the event that the user, or another person authorized by the userdesires to view or otherwise access only a limited portion of the totalbacked up data or content for the user, and alternative procedure forpartial reconstruction may be utilized. Again, this may depend on thelevel at which the data that is desired to be viewed was processed anddispersed.

The inventive system and method have strong security and such strongsecurity is unusual for the backup storage industry. In at least oneembodiment of the system and method there is a very strong separation ofuser data and user key information. For example, it may be appreciatedthat (i) although the user of course has access to and stores hisoriginal data on his computer or information appliance, the user neversees or stores the user key (which is only stored on the managementserver); the management server never holds the original raw user data(and in preferred embodiments, never sees or holds the raw user data),and the data storage nodes never see the user data or user keys and mayonly store and have the potential of seeing a part of the dispersed datathat was advantageously encrypted and compressed prior to the dispersal.Therefore the user data and the key are never in the same locationexcept for very short temporal window during encryption or decryption.Therefore even if two nodes could successfully be attached andcompromised, such compromise would not be sufficient to allowunauthorized access and reconstruction of the users actual data, files,folders, or other content stored on the service. One would need to havethe user information including the user ID and the user password (plusany secondary authentication optionally in place).

The IDA is similar to that described in the papers so one would needaccess to several nodes in order to reconstruct the (or a portion of)the user original data.

Since the user data was advantageously encrypted through a cipher beforebeing split up by the IDA, one must have in addition to access to somenumber Z of the data nodes, also have access to the keys for the userwhich are only stored in the management server.

It may be noted that the user key may be temporarily resident in RAM inthe client for the time it will take to perform the optional butdesirable encrypt and decrypt (when such encrypt and decrypt areperformed by the client) but it does not live and is not stored in anynonvolatile form on the client side machine and the client software isarchitected to obfuscate this usage of the key and obliterate the ramstorage by overwriting with random data.

For retrieval and restoration back to the user client machine, allcommunication of data from the storage nodes is of encrypted andcompressed data where the retrieving and restoring computer performs thedecryption and decompression locally.

For retrieval in the file sharing mode or when files are to be restoredto a different computer or machine, enough credentials must be providedto satisfy the system manager that the requested retrieval orrestoration should be authorized. The management server may broker thedecryption and decompression through either or both of the file sharingblock and the web access block to the requesting user. The user cantherefore recover all of their data to a new or different computer orinformation appliance.

It may be appreciated that since for at least one embodiment, the goalis to achieve a measure of consumer level security, and the managementserver only stores an MD5 or SHA1 Hash of the user password, only theuser has the actual password, and therefore for at least someembodiments of the invention, a user providing a password will beentitled to retrieve and reconstruct their data. Other embodiments ofthe invention may provide additional security or require additionalenrollment (such as for example the user of biometric input) and requireadditional authentication for restoration or retrieval to another newdevice. Registration of a device may also be required at the time ofuser registration and retrieval and restoration may require that therestoration be matched to the same device, unless additional informationis required. Client nodes will communicate to the management servicewith HTTPS with client and server certificates. This solution allows forthe client device to validate that the service is who it claims to be byvalidating the certificate and allows for the service to validate thatthe client is who it claims to be by validating it's certificate.

Aside from breaking into the required Z storage nodes and the managementserver, the only way to break into the system through the interface isto provide a password which does hash to the stored MD5 or SHA1 hasstored on the management server. As increased security may be requiredor desired, such additional security or authentication may readily beincorporated into the inventive system and method such as a second-stageauthentication system.

Even for web access, it is not the management server that is pulling thedata down directly, there may still be a client side application in theform of an applet or plug-in. If the local client, such as for example athin web access terminal or low end computer, does not have sufficientresources or capabilities, the server may broker the retrieval and/orrestoration. An inquiry means or program may be utilized to determine ifa computer system or information appliance has sufficient processingcapability to perform the retrieval or restoration, including anyrequired decompression and/or decryption. The threshold for processingcapability may be fixed or may depend on as assessment of the time thatmay be required given the data set to be retrieved or restored.

The embodiment in FIG. 3 includes a web access node or terminal that maybe used by an a registered user, a share user, or in some embodiments ofthe invention by a new user attempting to make a registration.

In this non-limiting embodiment, the web access node or terminal permitsa person (or user) to access the system without the benefit of anearlier registration and may advantageously permit the person to accessthe system and service without the client software, applet, orapplication installed on that web access node or terminal. In someinstances, such as at an airport Internet access location, an Internetcafé, or other somewhat public web or Internet access locations (thatmay be free or fee based), a user is not permitted to download a clientsoftware or the local system may deny such download even if the userdesires or needs such download.

The web-based access node or terminal may therefore only have a generichardware and software configuration and no ability for software to beadded to support the users desired access. In this situation, the usermay rely more on capabilities of the server and minimally if at all uponthe capabilities of the web access node terminal or device.

In the non-limiting but exemplary embodiment of FIG. 3, the Web accessnode or terminal 361 includes a processor 362 and a random access memory(RAM) 364 coupled to the processor. The web access node or terminal mayalso includes some local storage 375, such as a hard disk drive, solidstate memory, or the like for storing an operating system, andapplication programs of other code. Advantageously, the web access nodeor terminal will include an Internet or Web browser software application370 and a network interface module 371 such as a wired network interfacecard (NIC) or a wireless link, or any other communication interface thatpermits connection to the same network (possibly through any number ofbridges, routers or network translation layers) on which the server 102is connected.

The management server may provide the capability to retrieve and/orrestore user data from a generic web browser that does not include thefeatures and capabilities of the inventive client software, applet,plug-in, or the like. That is, the management server can put the userdata back together (e.g., the reverse of the dispersal) and do thedecompression and decryption, push the data back to the retrievingcomputer, and provide a display of the information to a generic screenusing the generic browser. In at least one non-limiting embodiment, anactive-X component or other program will be provided on the retrievingcomputer to offload the processing from the management server to themachine on which a portion or all of the user data is to be retrievedand/or restored. It will reassemble, decompress, and decrypt them in aprocess that is essentially the reverse of the upload associated withthe backup.

However the server provided approach is disadvantageous in many ways.Firstly, a lot of server processing power is being utilized as well asbandwidth being consumed. This approach is therefore usually limited toretrieval for practical and business reasons. In at least somenon-limiting embodiments, the initial upload and information dispersalis performed by the registered client side computer or informationappliance. This is not a limitation of the invention, but a practicalpreference.

For retrieval access, the capabilities may be provided in the web accessnode or terminal may for example be provided by a Java Plug in orActive-X control, or by analogous means, that are accessed from theservice web site or server, which may usually be available even on lowlevel computers or terminals. It would thus be possible to perform thereverse of the information dispersal algorithm, decryption anddecompression, and building and putting the files back into the filesystem. In general, so long as a communication can be established withthe server 102, and the user can add a storage device that providesaccessible storage either for uploading or downloading data to or fromthe system and service, the user will be able to interact with thesystem. In one embodiment, the user may provide this storage using a USBflash memory card or other similar means.

The inventive system and method leverages the unused and available sparespace on consumers PCs or other information appliances that areavailable or may become available in the future to store other differentuser's or consumers' backup content. It may be appreciated in light ofthe description provided herein that future generators of digitalcontent may use or store the generated digital content on devices,storage systems, information appliances, or media devices different frompersonal computers, and that embodiments of the invention pertain touser or consumer nodes different from personal computers and that thestorage devices and subsystems within such nodes may be other than harddisk drives, optical drives, solid state memories, or any other storagedevice or media.

Peer-to-peer communication and networking technologies and methodologiesare combined with a service manager advantageously located on amanagement server to direct and control operation of the system. Thismanaged peer-to-peer hybrid configuration is leveraged to enable theseindividual personal computers, information appliances, (or other node ornetworked devices) to communicate directly with each other for movingaround and/or transferring this backed-up content. The peer nodeshowever do not operate by themselves as they would or might operate in apure peer-to-peer network or file sharing or file backup architecture.The manager of the backup service and method is integrally involved withinitial insert of file, folder, content and/or data storage anddispersion into the network nodes; and, participates in the retrieval,recovery, and restoration of the original files, folders, content,and/or data to the originating computer or to a different computer ordevice. The service manager may also continually manage the peer networkto assure reliable operation and integrity as described elsewhereherein.

In addition to the inventive system architecture, the invention alsoprovides a service and a service manager component that manages theindividual computers or information storage nodes and storage devices atthose nodes on the Internet to decide which peer nodes are the mostappropriate nodes on which to store individual user's content. By way ofexample but not limitation, the selection of the most appropriate nodemay be based on one, more than one, or any combination of such factorsas: total storage capacity available, history of reliability or failure,uptime or availability on the network, storage device bandwidth,existing backup for the same or other users, presence on-line oron-network, actual presence in one physical location so that if it is amobile device like a notebook computer it may be marked as less reliablethan a fixed computer (gets a lower score, network connection and speedand/or bandwidth between the peers or between either peer and anyoptional server, geographic location of the peer, relative time betweenor absolute time at the backup user location and other subscriberslocations, geo-location (IP based) with preference to higher score forstorage node geographically close to user rather than across the world,national or legal restrictions relative to content, Internet Protocolbased location determination, determination of device mobility orstationary character, and any combination of these. Other factorsappropriate to the network as a whole, to particular users or usergroups and/or locations may also be considered.

In the event that a user requesting registration with the service andtherefore needing to provide storage on his/her computer's storagedevice as one of the storage nodes that other users may access, does notappear to have a reliable storage device, the system may message theuser indicating the assessment that his/her computer has a lowreliability, and that if the user wants to continue using the service,that user will need to increase the reliability of their storage to theservice community, either by taking steps to increase reliability orpurchasing some other after-market storage solution like a NAS. Suchsteps may for example include one or more of leaving the computerconnected on-line, by outsourcing the storage responsibility to anotherentity, by identifying an on-line storage device at another location, orby taking other measures to increase reliability of their storagecontribution. In one embodiment, the user may pay the service to providethe backup of the user's files if they are unable or unwilling toincrease their reliability. In another embodiment, the user may pay anoutsourced entity, such as an independent entity of the user's choosingor a partner of the service, to provide storage on their behalf. This isone of the reasons a single user may have multiple storage nodes, asthey may store and backup from multiple devices at multiple locations.

It may be appreciated that unreliability is not an indication of badcharacter or actions of a user. For example notebook computers may havesmall hard disks, be offline a lot, and move from location to location.As such the notebook computer may appear to be an unreliable storagenode. This provides one rationale for a business model that includespartnering with others who can provide reliable storage to provide theuser with an ability to backup data while not actually using their ownnotebook computer for storing the data of others. By partnering with adisc drive or other storage device manufacturer to purchase and set updisc drive to be used as the users surrogate storage node separate fromuser's computer. Alternatively, the user may arrange to use a portion ofstorage on a disc farm or other shared storage facility. These and otherways are referred to as outsourcing storage.

This management methodology may actively choose to change the peers andpeer node storage device that a user's content is backed up on toresolve issues of unreliable system and/or storage devices. For example,if it is determined that a peer node device or its storage is frequentlyoffline or unavailable or that some data or content are received withapparent errors (correctable or uncorrectable) on a frequency that isabove some acceptable error threshold, then the PC and storage devicemanager may mark that node and its device as a node or device not to beutilized for future storage of other user's content backup. Policies mayoptionally be implemented to alter the terms under which the owner ofthat node device and storage such that since that user is not providinga reliable storage for other user backup, either the user may be invitedto upgrade his/her equipment, cease using the backup service, pay a feeor an additional fee for accessing other user's storage, or taking otheraction as may be suggested or required by the system provider.

It may be appreciated in light of the description provided here, that inspite of potential problems that may initially be encountered with asmall number of subscriber user computers or storage devices, on thewhole, given the built in redundancy of having a plurality of nodestorage devices and only requiring a smaller number of such node storagedevices to be available in the event a recreation of the user content isrequired, the inventive content restoration and backup service, systemand method using a service manager increases the storage reliability andsecurity far beyond what an individual user or a pure peer-to-peerstorage solution could provide on its own.

This enhanced security and reliability are facilitation by a novelInformation Dispersal Algorithm (IDA), process and computer program.Certain specific information dispersal algorithms have been knownbefore, and in fact one limited example is the Redundant Array ofIndependent Disc (RAID) storage methodology and storage subsystemarchitecture, which may be thought of as one limited special case ofinformation dispersal. The basic idea of information dispersalalgorithms is that some original information is able to be broken downor partitioned into a plurality of or multiple pieces, but only somesubset of the total plurality of pieces are necessary to reconstituteall the original information. Another example of an informationdispersal algorithm is suggested in the paper by Michael O. Rabin,entitled “Efficient Dispersal of Information of Security, LoadBalancing, and Fault Tolerance” (Journal of the Association forComputing Machinery, Vol. 36, No. 2, April 1989, pp. 335-348.) cited inthe background of the invention section, and incorporated by referenceherein.

The inventive Information Dispersal Algorithm is designed such that theinventive system management block directs and controls the service sothat it determines: (i) how many other different users storage deviceswill contain some portion of a particular user's backup data, files orother content, and (ii) how many of those different storage devices mustbe available to reconstitute the user's backup data, files or content.Other differences between conventional approaches and applications oninformation or data dispersal and the inventive approach and applicationof information or data dispersal are described elsewhere herein.

Even when the inventive system, method, and service may use or be basedon an information dispersal algorithm such as described in Rabin, thereare differences in the structure, operation, performance, andapplicability of the present invention as compared to Rabin.

For example, one of the primary differences is associated with themanagement component and the dynamic application of an informationdispersal algorithm and approach, as compared to the static approach ofRabin or others. The Rabin IDA algorithm is alone not sufficient toprovide the features and operability of the present invention.

In the inventive system and method, not only are parameters setinitially, but in addition they may be refined, tuned, revised, updated,and optimized on a continuous basis in an automated or manual fashion.The management of the data and the peer storage nodes is cooperativelyintertwined with an information dispersal algorithm, especially beyondpoint in time of the initial data dispersion.

For example, while a theoretical paper may suggest dispersing data forbackup to some number “n” storage nodes, this is not enough,particularly in a consumer personal computer based peer storage systemarchitecture. In fact, where the nodes are located, and how reliable thenodes are important considerations in an Internet environmentdeployment. At least some of the tradeoffs and optimizations areentirely different from a dispersed information storage system in whichvirtually all of the peer storage nodes were themselves at managedstorage facilities where high reliability might be taken for granted.

In at least some embodiments of the invention, management by watching ormonitoring the nodes by pinging the nodes to test for availability andreliability, tracking historical availability and reliability of thestorage nodes, moving or redispersing data when a node has or appears tobe trending toward unreliability, and other testing and monitoring on areal-time basis. Changes may be made when parameters exceed certainpolicy or rule based thresholds, and the thresholds themselves may becontinually modified. Conventional approaches do not provide thesemanagement features that may continuously assess and optionally alterthe dispersal of the information, including possibly reassessing andchanging the number of nodes, the redundancy factors, and/or otherparameters associated with peer node information dispersal.

Recall that with the present invention, it is not necessary to have allof the nodes present to be able to shuffle or create another reliablestorage node to replace a node that has gone offline or has show itsunreliability. One only needs some subset Z of M total nodes and as longas these Z nodes are available, the data can be redispersed to othermore reliable nodes. One does not need to wait for a node to becomeunavailable to replace it, but one could notice the trend towardsunreliability and act.

One particularly advantageous feature of embodiments of the invention isthe capability to continually monitor nodes and if a conditional isobserved under which some of the storage nodes are unavailable orunreliable, or are showing a history or pattern of unavailability orunreliability, the data may be redispersed in part or in whole to adifferent set of storage nodes.

For example, if a file, set of files, or other block of data wasoriginally or is currently dispersed onto sixteen storage nodes, and theservice manager recognizes a situation in which three of the sixteennodes have become unreliable, then the service manager in the server mayregenerate and redisperse the data on those unreliable nodes to a newset of storage nodes that have history of good reliability. Theredispersion may be accomplished by simply moving the data in tact fromeach of the three unreliable nodes to three reliable nodes if the datastored on the unreliable nodes are available. If one or more of theunreliable nodes is unavailable, which may often be the reason for adetermination of unreliability, then the system manager may either: (i)go back to the source computer and recreate the segments of data thatwould correspond to the data on the now unreliable and unavailablestorage nodes, or (ii) reverse dispersal from the remaining thirteennodes, to recreate the same segments of data that were stored on the nowoff-line storage nodes, and communicate or disperse these segments toreliable nodes.

Recall that data may be dispersed at any one or more of varioushierarchical levels, and that different embodiments of the invention, oreven different backups of the same or different users within aparticular implementation of the system and method may user or applydata dispersal at different hierarchical levels.

Redispersement may be done on the management server, or by a separateserver or engine coupled with the server and operating under thedirection and control of the management server, rather than on the usercomputer. It may also be done from another node on the network, such asfrom the data owner's computer. In one embodiment, the server may pullthe compressed encrypted data from reliable storage nodes unto theserver to replicate the condition that existed prior to the original IDAapplication, and then use the IDA again with identified new reliablestorage nodes to go back to generate new storage vectors and dispersethe data. It is not necessary to decrypt or uncompress that data becausethe dispersal can be applied to any data either in original form or inthe compressed and encrypted form. In another embodiment, the pieces orsegments may be moved, duplicated, or otherwise sent to reliable storagenodes.

It may be appreciated from the description provided herein that not onlydo embodiments of the invention provide for initial upload or insertionof the user data, and later download or retrieval or recovery of thatdata (optionally including updates and changes to it), but it alsoprovides a lifetime data dynamic management and control. By comparison,conventional information dispersal schemes and even application ofconventional information dispersal schemes alone focused on and werelimited to static environments. The system architecture, processing, andmethod of the present invention are dynamic, and the storageconfiguration is reconfigurable relative to changing from unreliablenodes to reliable nodes and even to changing the number of nodes neededon an individual user or file (or other basis). For example, iforiginally a user data set was dispersed to 16 storage nodes and theseare found over time to be very reliable, the management server mightreduce the number of required nodes to 12 nodes or some other number ofnodes with high confidence that the user data can be reconstructed froma subset of these. The manager may continue to dynamically monitor andupdate so that the number of storage nodes may change up or down fromtime to time.

An exemplary non-limiting embodiment of a method 700 for inserting datainto the system and service and for maintaining the data in the systemand service including dynamic data dispersal is now described relativeto the flow chart diagram in FIG. 7.

Data to be backed up is identified (Step 702) and optionally butadvantageously compressed (step 704) and encrypted (step 706). Adetermination is made as to whether this is an initial data set or anadded data set (step 708) There is not really much difference hereexcept that if it is added data there will be a need to do a new oradditional IDA of at least the new data, and the new data IDA processwill occur independently of dynamic IDA based on the recognition ofunreliable peer nose storage.

If the determination in step 708 finds that it is an initial upload orinsertion of data (“initial”) then an initial data dispersement strategyis identified which may usually include optimization and tuning for thecurrent set of peer nodes and possibly relative to the user data (step710). The data is then dispersed (all of the data for an initialdispersement, though the dispersement may be done in pieces ifindividual file based or in some block that is less than all of the datato be dispersed) to peer nodes according to the current dispersementstrategy (step 712). The current dispersement strategy may be theinitial strategy if this is the first upload or a dynamically modifiedand revised strategy if there have been earlier dispersements. Afterdispersement, or even during dispersement if the system finds a peernode that was going to be used, the system monitors and/or verifies thecontinued reliability of each peer node on which a user data is stored(step 714). It may do this for an individual user data as a set ormaintain a reliability status for all user data nodes. The monitoringmay occur in any order and the results maintained in the database foreach node. A determination is then made as to whether any peer node hasbecome unavailable or unreliable (step 716). In the determination as towhether any peer node is no longer reliable is negative (No) (step 716),then an additional optional determination may be made as to whetherthere is any new data to be added to the user's backup (step 722). Ifthe answer is no, then the system and method continue to monitor and/orverify the continued reliability of each peer storage node on which auser's data is stored (step 716). On the other hand, if there is newdata to be added (step 722) then the data to be added is identified(step 702), optionally compressed (step 704), and optionally encrypted(step 706). Since this is added data, the determination as to whetherthis is an initial data set of an added data set is positive (yes) (step708), the method continues by determining a revised data dispersementstrategy using only currently reliable peer storage nodes (step 720).The new data and optionally the new and the initial data is dispersed topeer storage nodes according to the currently identified dispersementstrategy (step 714). It may be appreciated that steps 712, 714, 716,718, and 720 will repeat continuously to dynamically manage the storageand dispersal of the users data. The procedure may be considered todeviate when new data is inserted into the system, or they may beconsidered to be two independent processes where the existing data iscontinually monitored, even as new data is added, and then themonitoring continues in its next cycle with the larger set of data andpotentially larger set of peer nodes.

It may also be appreciated that although this process has been describedrelative to a single user's data, the process may also be applied to allof the data on the system for all users and all nodes.

FIG. 8 is an illustration showing an exemplary embodiment of a methodfor retrieving previously stored user data from the system and service.The procedure starts (step 802) by identifying a data or data set to beretrieved or restored (step 804). Next, the current set of peer storagenodes for the identified data are identified from the IDA dispersementstorage vectors (or other identifiers) and because they are not allrequired for retrieval, a subset of the peer nodes on which the data isstored are identified according to some rule, policy, or at random (step805). The data from the plurality of peer nodes is then communicated ortransmitted over the network to the node identified as the retrievalnode (which may in some instances be the server) and stored at least ona temporary basis there (step 808). Because each peer node will sendonly a segment of a multi-segment data set, an undispersed data isgenerated from the plurality of segments received from the plurality ofpeer nodes (step 810). Any data that was encrypted is decrypted (step812) and any data that was compressed is decompressed (step 814). Theretrieved and reassembled data is now restored to the retrieval node,usually by the original owning user, and restored to the file system(step 816).

An exemplary use scenario is now described beginning with a usersinitial steps at registration and continuing through an initial backupor insertion of data into the system and service.

In one embodiment, the backup and content or data retrieval and restoreservice is a free to the end user or subscriber. In one embodiment of afee user service, revenues for operating the service and any profits maybe derived from advertising, from partnering arrangements, a combinationof these, or from other sources. In another embodiment, a fee may becharged to a user or to a group of users for using or accessing theservice or content or data stored by the service. The fees may be fixed,may differ depending upon the number and/or size of data stored, thenumber of accesses in a given period of time, the interaction ornon-interaction with service partners, or according to other factors.

A non-limiting exemplary use scenario and associated operation is nowdescribed from the perspective of a new user accessing the service forthe first time relative to FIG. 6. This process may be highlighted asfollows with further explanation of optional elements in the followingparagraphs as well as elsewhere in this specification.

This procedure 600 may be summarized as follows and is depicted in theexemplary flow chart in FIG. 6. First, the system presents user withuser registration interface (Step 602). The system then receive arequest from the user for registration (step 604) and in responsethereto, download a client applet or program to the user's computer(step 606). The applet or computer program is installed on the usercomputer (step 608). Next, the system receives registration informationfrom the user and either record a system assigned or user chosen user idand password (step 610). In the exemplary embodiment, the systemreceives a post-registration user login request with an id and password(step 612). The system verify the user identity with the user ID andpassword (step 614). Optionally the system surveys the files, folders,data, and/or content of the users computer and suggest a backup strategyto the user (step 616), and may receive an identification by the user offiles, folders, data or content to backup or store to the service (step620). Optionally compress and encrypt the identified user data and thendetermine an initial information or data dispersement strategy among aplurality of network peer storage nodes (step 622). The system may theninitially disperse the user data according to the dispersement strategy(step 624). The system service may then monitor peer node availabilityand reliability and redisperse according to established redispersementpolicy (step 626). This last step may be performed iteratively, and theuser may also add addition data to the backup storage set which may alsoresult in additional data dispersal and possibly redispersal of otherdata that was earlier uploaded to the system and service.

A user desiring to become a registered user or subscriber of theinventive service may initially access a service web site which may havebeen identified to her/him by various methods. The user may be presentedwith a menu or a button inviting the user the register and the user thenpresses a hot spot or button on the display or otherwise initiatesdownloading of a thin client software application program or applet. Inone embodiment of the invention, agreeing to download some form ofcomputer program code, applet, plug-in, or the like is required forregistration as a user entitled to store or backup their data on thenetwork. Other embodiments, of the invention, including some Webaccess-based use, may utilize generic web browser code and may notrequire downloading of service specific software, however this type ofuse may have limitations as described elsewhere herein.

Versions of the service client are available for different computingplatforms such as IBM compatible PCs and Apple Computer Macs, as well asother computing platforms or entertainment systems, devices, or othercontent generation or storage devices. In one embodiment, the user maybe presented with a list of systems, or devices, or asked to identifytheir system or device type. In another embodiment, a single clientprogram is compatible with a plurality of device types so that no userselection is required.

The applet, software, plug-in, or other client software or code is theninstalled, either automatically, or under control of the user or by aninstallation wizard interacting with the user and the user's computer,information appliance, or system. It is anticipated that computers,information appliances, entertainment systems, and media generation andplayback means may change over the coming years so that it should beunderstood that computer and/or information appliance are intended toinclude their common and usual meanings as well as systems and devicesthat have a capability to generate and/or store data, files, or othercontent, possibly including but not limited to moving or still picturesand images, music, voice recordings, text documents, business documents,spreadsheets, and any other type of digital information.

Once the applet is installed, the applet or continued interaction withthe web site, will ask the user for a user identification (ID) and apassword. Alternatively, the system may assign a user ID and either apermanent or temporary password. The identification may be any name,number or other identifier that the user (or system) may care to use orassign. In at least one embodiment of the invention, the system does notstore the user password but only stores and relies upon a match to thehash of the password when a registered user attempts a login. In oneembodiment, the registration process may require or request that theuser input additional information, such as for example but not limitedto home and/or business address information, full legal name, telephonenumber, areas of interest, password recovery related information, orother information that may be desired for security, marketing, systemand service improvement, or other purposes. Privacy policies may also bepresented and the user requested to approve such privacy policy.However, in at least one embodiment only an account identifier and apassword are required for registration of the user who then becomes asubscriber.

The client applet, once installed, can then communicate directly back tothe service management server over the Internet, web, or other networkconnection or communication established between the user computer orinformation appliance and the service management server. The client thenregisters the user with the user identifier and user password. Securecommunication schemes as are known in the art may be utilized.

In one embodiment, after registration has been completed, furtherinteraction may be performed between the user and the service using anInternet web based interface. Other embodiments of the invention mayprovide for direct connection or non-Internet based interaction. Otherembodiments of the invention may be deployed and supported overintranets.

The web site interface provides a login page. In at least one embodimentof the invention, the web site may also provide a registration page.After registration from the web page, or during such registrationprocedure, the client software may be downloaded as described.

In one non-limiting embodiment, when the registered user logs into theiraccount, they will see a list or other presentation of all of theirfiles or content that is backed up or stored on the peer nodes.Initially, this list or presentation may be empty or blank since theywill not have uploaded any files, folders, or content to the service.They may also optionally be presented with a list or other presentationof content on their own computer, with optional indications as to whathas been backed up and what has not been. Graphics and colors mayadvantageously be used to highlight backed-up and/or non-backed-upfiles, folders, or other content. In one embodiment, these lists aregenerated and/or maintained by the service client software executing aninventory procedure on the user's computer or other informationappliance. In another embodiment, the service management server mayquery and examine the user's computer or information appliance directly,but this is not preferred. In one embodiment, this inventory may beperformed periodically or according to other rules or policies, and/orat the request of the user.

In one embodiment, when the registered user first logs into the service,the service recognizes that this login is the users firstpost-registration login (or recognizes that the registered user has notyet identified any files, folders, or other content for backup), andpresents a backup wizard to assist the user in his/her interaction withthe service. Various interactions or dialogs may be used, and the use ofwizards are known in the art of computers and are not described indetail here.

The wizard may ask the user what files, folders, or content the userwants to backup. The client software has separately and in thebackground done a search of all or an identified portion of the userscomputer or information appliance or device, and identified files,folders, or content that may be appropriate for backup by the service.In one embodiment, the client may optionally request that the user maylimit or otherwise direct that the client or service limit the search toparticular storage devices, folders, files, file types, content, orcontent types, or according to any other criteria. This may for examplebe done for user privacy reasons. The client and/or service may alsooptionally constrain the files, folders, or content. By way of example,but not of limitation, the client or service may constrain the backupaccording to maximum file sizes, a maximum total backup file or contentsize, to particular file or content types, or according to othercriteria. Various search filters may be provided by the client orservice to assist the user in identifying files or content for backup.The client or service may optionally also provide means for identifyingcopyrighted content or other content that may be subject to digitalrights management.

In at least one embodiment, the service is provided as the users backupand the files or content are stored in a manner (described hereinelsewhere) that makes it impossible for any other access than by theuser so that event the backup of legitimately obtain copyrightedmaterial does not present any copying or file sharing issues. In oneembodiment, copyrighted material having a digital rights managementfeature may be backed up by the user, but prevented from being restoredto an account other than the account associated with the registereduser. More particularly, embodiments of the invention may optionallyprovide for a form of content management that permits user created filesto be shared with an identified group (such as a limited number offriends and family members) associated with the registered user, butthat may prevent files or content that may be subject to copyright frombeing shared with other registered or non-registered users. In at leastone embodiment, a maximum group size is provided so that less than theentire world is permitted access to a registered users files or contenton a shared basis. Embodiments of the invention may also oralternatively provide that some maximum number of file or contentsharing logins or access may occur within a defined period of time, oraccording to other criteria.

Returning to the description of the procedure, the wizard may forexample recognize that there are pictures in a “My Pictures” folder,that there are music files or content in a “My Music” or “My iTunes”folder, and recommend that certain files, folders, or other content areappropriate for backup by the service.

If the user then indicates, such as by clicking a button on the webscreen, that the user does wish to backup some set of files, folders, orcontent, then the service server will communicate back to the clientthat the identified files, folders, or content should be backed up. Inone embodiment of the invention, an optional version monitoring andcontrol may be provided so that a creation or modification date of afile, folder, or content item that has an otherwise identical name ismonitored and a determination made as to whether that item is aduplicate, newer replacement, and if it should be overwritten orreplaced in the backup, a second copy bearing a version identifier ornumber appended to the file, or other action taken autonomously by theservice or as an interaction with the registered user's input.

The server communicates the criteria for backing up the user files,folders, and/or content and client is thereby made aware of files (orfile types), folders, and/or other content that should be backed up. Inone embodiment, the server identifies to the client that files orcontent in a defined set of folders should be backed up. In anotherembodiment, the server identifies to the client that certain new filetypes should be backed up independent of the folder, so that by way ofexample but not limitation, all JPEG (*.jpg) picture files or all MP3(e.g., *.mp3) music files should be backed up if they are detectedanywhere on the users computer or information appliance. Theidentification of folders is advantageous as it reduces the search andcomputational burden of the client when it performs a search. In oneembodiment of the invention, the client or the applet or program elementmay optionally update a database or list when ever a new file or contentof an identified file or content type is created or downloaded to theuser's computer or information appliance, thereby eliminating the needto perform a search. In one non-limiting embodiment, the client or otherapplet or program element executing in the user computer or informationappliance may monitor the number, total size, last backup, and/or otherinformation associated with files, folders, or content and recommend abackup be performed.

Since the client is now aware of these folders, file types, contenttypes, or other backup criteria, the service client program mayoptionally but advantageously monitor or look at the folders (which mayfor example be the entire storage device or devices connected to orotherwise identified with a user's computer or information appliance, oronly a folder or subfolder of that or those devices) for new files orcontent meeting the backup criteria. This monitoring may be performedaccording to some rules or policies, and may by way of example but notlimitation, monitor more or less continuously, at periodic intervals setby the system or user, or according to any other procedure.

Initially, when the user first identifies the folders to be backed up,all of the files, subfolders, or other content meeting the backupcriteria will be backed up. Subsequently, the client will monitor forany newly added files or content and backup that content. In oneembodiment, newly added files are backed up by adding the additionalfiles or content to the previous backup set. In another embodiment, thenewly added files are backed up by creating a second, third, fourth, orsubsequent backup set so that a single user may have more than onedispersed data set backup. In one embodiment, the multiple backup setsare maintained separately over a period of time, though the user may notbe aware of this separation which is transparent to the user. In anotherembodiment, any multiple backup sets are recombined according topredetermined or dynamically determined policies or rules. The rules orpolicies may for example take into account such factors as the number ofseparate backup sets, the size of any one or more of the data backupsets, the frequency with which the user add files, folders, or othercontent, the availability of processing power and/or bandwidth toperform and required compression/decompression, encryption/decryption,and/or dispersal to the same or a different set of peer storage nodes.In one embodiment of the invention, files (or blocks of data) identifiedas being deleted by a user are flagged or otherwise identified as beingdeleted in a files table in the database on the management service. Inone embodiment, the files even though marked as deleted may be retainedso that they are still recoverable if the user changes their mind ormade a mistake. In one embodiment, a rule or policy may be utilized sothat the files are deleted after some predetermined or dynamicallydetermined period of time. In another embodiment, the user is sent amessage requesting verification of file deletion. In one embodiment,this verification is requested at the time the files are deleted, whilein another embodiment, the verification is requested at a later date. Inone embodiment, the later date may be between a month and a year afterthe user indicates the files are to be deleted.

In one embodiment, the backup methodology is tuned to a consumer marketsegment in which pictures, images, video, music files, and similarconsumer oriented content is created and placed onto the user computeror information appliance or downloaded from another source and notchanged, therefore change or version control is not required or evennecessarily useful. In one embodiment, a change history for a file orfiles may optionally be maintained if desired. Therefore the primarygoal of this non-limiting embodiment is to identify new files or contentby name that has not been previously backed up and either back it up oridentify it for backup at the next scheduled backup. Various differentbackup initiation criteria may be applied as it may be appreciated thatit is not necessarily efficient for a backup to be preformed immediatelyfollowing creation or downloading of a new file or content. In oneembodiment of the invention, a backup may be performed according to atime schedule, according to a number of new items that have beenidentified for backup, according to a total file size of files orcontent identified for backup, or according to a combination of thesecriteria alone or in combination with other factors or criteria.

In other embodiments of the invention, change, modification, and otherrevision control may be provided so that the user may at least be madeaware that multiple versions of the same named file or content may existon their computer or information appliance and/or in an existing backupor backup to be created. A user may then be given an opportunity tochoose how such changed, modified, revised files or contents should behandled. In one non-limiting embodiment, a software program or tool maybe used or provided to identify differences between multiple versions ofa file or content.

In one non-limiting embodiment, small or low-resolution so calledthumbnails may optionally but advantageously be created from at leastcertain picture or image types and stored on the service server so thatif a user wants to review what is backed up and cannot associate a filename, such as for example, one of the common digital camera file nameslike “DSC_(—)0257.JPG”, the user may view a thumbnail image of thatfile. Storage of the thumbnail on the server also alleviates anypossible need to retrieve distributed portions of the users backup froma plurality of nodes and perform any decryption and/or decompressionthat would be required to view the backed up user image file. In onenon-limiting embodiment, the invention may also provide reducedresolution versions of other file or content types, such as by way ofexample but not limitation, thumbnails or equivalent of Adobe™ Acrobatfiles, Microsoft™ Word documents, spreadsheet documents, or any othertype of document or file. In yet another non-limiting embodiment, shortversions (for example a few seconds) of music files, video files, orother audio or media content may optionally but advantageously be storedon the server for similar purposes of review by the registered userassociated with that content (and when optionally provided by anauthorized group associated with that registered user).

The optional provision of image and/or audio thumbnails providessignificant advantages for a user's review of backup and retrieval aswell as for retrieval or viewing from a computer or informationappliance different from the computer or information appliance thatactually may still store the original images or music files. Thissituation may occur, for example, either when a registered user needs toaccess or restore files or content to a computer or informationappliance that is different from (and possibly geographically remotefrom) the computer or information appliance where the originals arestored, or when a member of the registered user's group (e.g., friendsand family) want to access the service and view only selected ones ofthe users content items, particularly images or pictures. Recall thatcontrols may be provided that may control, moderate, or limit multiplesimultaneous access that might be in violation of copyright or otherfile or content sharing restrictions. In one embodiment, differentrestrictions may optionally be implemented for different users based onsuch factors or criteria as the users registered country or state, thelocation of the users computer or information appliance based on signalsfrom the users wired or wireless network interface. Restrictions onregistration or access may also be implemented based on a registereduser or associated group members (e.g., guest) identified age, country,geographical location, or according to other factors or criteria.

The registered user may also optionally identify other users orpotential users (also referred to herein as guest users) with folders,subfolders, files, content items, and/or content types. In oneembodiment, guests are identified by the registered user to the serviceusing their email addresses. Other embodiments may use otheridentification means.

Guest users may be registered users or non-registered users when theyare identified by the registered user. In one embodiment, guest usersare either required to register or are requested to register. In onenon-limiting embodiment, a guest user registers and obtains a new userID and their own password. The user ID and password may only be used inassociation with their access to the registered users group. A guestuser may be a member of many different groups and may either havedifferent IDs and passwords for each group, or have a single ID andpassword that permits access to all of the groups to which they havebeen associated.

In one embodiment, guest users may remain unregistered even afteraccessing a registered users content, or may register. In many instancesa guest user may not have their own content that needs backup so thatthere may be no motivation to register. One example, may be an elderlygentleman who has a computer or information appliance but does notcreate, download, or otherwise have a need for backup but wants to beable to view and occasionally download pictures of his granddaughter orgrandson.

Again, the number of guest users may be limited so as to prevent atleast the appearance of offering a file sharing service, particularly ifsome of the backed up content has use restrictions associated with thecontent. However, for user authored or other content with which no userestrictions apply, the number of guest users may be unlimited orsubstantially unlimited. In one non-limiting embodiment, a burden may beplaced on the registered user to separate other's copyrighted contentfrom content not subject to use or sharing restrictions. In anotherembodiment, the service attempts to identify content that is or may besubject to use or sharing restrictions, and to prevent sharing of suchcontent. In one embodiment, the service may also limit or prevent thegeneration of image or audio thumbnails of such content.

It may be appreciated that even when some file or content sharing may bepermitted, that the service is providing only a limited private sharednetwork within a registered user's share group and is not a public filesharing network. Embodiments of the invention may provide for limitingthe number of guest or share uses. For example, non-limiting embodimentsof the invention may provide for 10 share users, 20 share users, 50share users, or any other number of share users associated with aregistered user. Since access by share users is controlled by theservice management server, the number of share uses may be strictlylimited.

In one embodiment, the sharing is performed at the folder level for easeof administration. For example, a registered user may identify thecontents of their “My Pictures” folder, or the contents of a “Laura'sBirthday 2006” folder as being a shared folder. Other schemes foridentifying shared content may alternatively be implemented.

When the service server receives the email addresses of a registeredusers group members, the server sends an email message to each of theidentified group members informing them of their status as a member of aregistered users share group. In one embodiment, the email messageincludes a URL link, and the message invites the email recipient toclick on the link to access the service. In one non-limiting embodiment,the email message may include a personalized message from the registereduser. For example, the message may say that “Michael has decided toshare his pictures with you, click on the link to access the web site tosee his pictures.” In another embodiment, the email message may includeone or more content thumbnail images belonging to or associated with theregistered user.

The recipient may then click on the link to access the web siteassociated with the link and see files or other content that they havebeen invited to share. This linked web page may the same web page or adifferent web page than the web page a registered user accesses to login to the service. If the recipient of the email is already a registereduser, then when that registered user accesses the linked web page,he/she may log into their own account and then view a list of sharegroups that they belong to. Various interfaces may be provided and theinterfaces described here are provided as examples and are not to beconstrued as limiting in any way.

In the event that the recipient is not a registered user but wants toaccess the shared files or content, the invited guest may access theshared files without needing to download the client as was otherwiserequired for a new user wanting to use the service for file, folder, orcontent backup.

In one embodiment, when the registered user identifies one or aplurality of share group members, entries are written into the databasethat identify the share group members, and optionally provide a uniquebut temporary password for at least an initial login by the guest sharemember. The guest share member if not already registered will be asked(or forced) to change the password after logging in. The invited guestor share user will then be asked in they would like to become aregistered user entitled to use the free service for backing up theirfiles, folders, or other content. The invited guest user may then beencouraged to register so that not only may they view other's sharedcontent, but may also utilize the service for their own backup andoptionally to share some or all of their own content with their ownshare group.

In one embodiment, the service may generate revenue based onadvertisement (ad) placement either in the form or banner ads, pop-upads, or other forms of add placement known in the art. Revenues may alsobe generated based on ads-presented, ads that have been clicked through,generated sales, or other advertisement or sales based models as areknown in the art. Revenues may also be generated for example, bypartnering with picture or photo printing entities so that a registereduser and/or share group guests may be presented with offers for Internetor web based picture printing at favorable rates. In one embodiment, adatabase may optionally be maintained to identify digital or electronicimages that have never been printed in a hardcopy format, so that notonly does the registered or guest user have an opportunity to maintainan archival backup but also a convenient means for obtaining printedphotographs. Similar offers may be presented for DVD compilations ofpictures, for user generated video content, or for other files, foldersor content. In one embodiment, the invention provides of receipt of ashare of revenues or profits derived from a users content and offerspresented to the user when they access the service. It may therefore beappreciated that service revenues may be increased by generatingexcitement at the service web site and that frequent visitation to theservice website by registered users and invited share group members isto be encouraged. In this regard, various promotions, contents, and/orincentives as may be permitted by law may be presented on the web siteby the service

In the event that the invited guest share group member is already aregistered user, they will be invited to enter their own registered userinformation and password, to obtain access to not only their own accountbut also to share groups that they are associated with. In at least oneembodiment, this prior registration may eliminate any need for a newaccount or password.

It may be appreciated that since in at least one embodiment, the serviceis a free service, for a registered user that uses the service only toprovide an emergency backup in the event the user's own file storagesystem fails, and in the most extreme case never again accesses theservice, then no revenue will be generated. However, since the cost forstoring an incremental users files, folders, or other content isrelatively small, there is little or no cost or loss for this type ofuser. Revenue is advantageously generated by the above advertising andpartnering revenue models. In general, the more frequently a user accessher/his content (or their share group members access the content) themore opportunity for revenue generation. The provision of share groupmembers email addresses also provides an opportunity for directedadvertising. In one embodiment, the share group member may be sentmessages indicating that additional content is now available by clickinga link. Advertisements may be presented in the email itself, or throughthe link, or at the service web site when the recipient attempts to viewthe new content. In one embodiment, an order for prints of new picturesmay be provided using the thumbnails. The share group user may thenfollow-through with the print (or other media) order or edit it in someway before placing the order. The order may require furtheridentification of the user, such as a name and mailing address, as wellas credit card or other payment information. These purchase interfacesmay advantageously be performed over a secure connection as is known inthe art.

In at least one non-limiting embodiment, advertisement and/or partneringrelationships may be customized or personalized based on a perceivedregistered or guest user actual or derived characteristic, and/or basedin whole or in part on information derived from a registered usersstored content, and/or from the content identified for sharing tomembers of a registered users one or plurality of share groups, and/orfrom the content that is actually viewed or otherwise accessed by one ora plurality of members of the share group. Advantageously, the servicewill take due regard for registered user and guest share group memberprivacy and either inform the user and/or guest of their privacy policyand/or obtain permission before performing an analysis of the content,data mining, access patterns, user or guest profiles or purchasingpatterns or the like.

It may be appreciated that the service server only stores administrativeinformation to permit recovery and reconstruction of a users files,folders, and content; it stores the file, folder or content name but itdoes not store the users actual original data. In some embodiments, theservice server may store thumbnails to assist the registered user andany share group members in accessing the original data. This morelimited storage provides at least somewhat of a privacy advantage ascompared to other file backup services which may typically have accessto an entire user content on a server. Recall also that none or thestorage nodes individually store sufficient information to construct anysingle file, folder, or content items; or in the event that a singlefile, folder, or content item may be of such size (e.g., small) andcharacter that it is dispersed only onto a single node's storage device,the effort to identify, decrypt and decompress that item, would makeaccess to that item impractical.

Non-limiting embodiments of the invention may provide for advertisingbased on file or content names alone. Other non-limiting embodiments mayanalyze picture or photo content and provide advertising based onidentified content or subject type. Other non-limiting embodiments mayanalyze music content and provide advertising based on identified musiccontent or type. Still other non-limiting embodiments may collect andstore meta data associated with picture files, audio or music files andthe like, and this may provide an additional basis for extractingcontext information that may be used for directed or personalizedadvertising and marketing.

Although the management server does not store user content, and in manyembodiments of the invention, does not even process or touch thecontent, the management server may have access to full uncompressedcontent, including for example to video content. This provides anopportunity for partnering in the form of sharing content with otherorganizations as well as with individual guests. The sharing may be userpermission based either at the time the user registers, or at adifferent time. In one non-limiting embodiment, a registered user mayinclude a backup folder for video content. The client or server mayidentify this content as being suitable for upload and sharing on theGoogle acquired YouTube, Inc. video web site, and the user may be askedor incentivized not only to perform the backup, but additionally toupload the content to the Google acquired YouTube, Inc. video contentposting and share web site. The content may optionally be passed througha conversion or transformation process, filter, or conversion, so thatthe content to be shared with the Google acquired YouTube, Inc. videosite is in a compatible format. Similar or analogous process may beprovided to communicate or post content to other potential partneringsites such as My Space or other social networking sites.

The inventive system, method, and service may also be beneficial in abusiness or corporate environment. In such an implementation, thenetwork may be a closed network or intranet rather than the Internet, ormay include components of an internal closed intranet and the Internet.In one embodiment, a company may have terabytes of unused space that maybe offered to employees for work based and/or personal file backup at nocost to the employee or to the company. In this case, the system andsoftware may be offered on a non-exclusive licensing basis and revenuescollected on this basis.

Other optional but advantageous features may be provided. For example,once the peer-to-peer storage network is in place and backing-up theuser's files, additional value-add services are offered to the user,including web-based access, file sharing and partnering with other sitesthat require user content. These additional services leverage thefeatures of the peer storage network to provide the functionality.

The first value-add service is a true web-based access method to auser's own files. This web-based service provides full access to all thebacked up content from any web terminal. Since the actual storage is inthe peer storage network, the user's own PC does not need to be on orreachable at the time of remote access for the ability to browse anddownload (restore) the files from the set. Obviously restore is just aspecial case of remote access and thus is achieved via this same method.

The second value-add service is a web-based sharing service. Since themost likely backed-up content by consumers is digital photographs andother user-created content, it is the same content that users are goingto want to share with friends and family. The same process of remoteaccess and restore (limited to content the owner chooses to share)allows other designated users to access and share the content of theseusers.

Additional Description

As used herein, the term “embodiment” means an embodiment that serves toillustrate by way of example but not limitation.

It will be appreciated to those skilled in the art that the precedingexamples and preferred embodiments are exemplary and not limiting to thescope of the present invention. It is intended that all permutations,enhancements, equivalents, and improvements thereto that are apparent tothose skilled in the art upon a reading of the specification and a studyof the drawings are included within the true spirit and scope of thepresent invention.

1. A server computer for operating a distributed data storage systemhaving data security, redundancy, and retrieval features, the serverincluding: a processor and a memory coupled to the processor; a networkcommunications interface for coupling the server computer to a network;a database for storing data pertaining to the distributed storage in thedistributed data storage system and coupled to or coupleable with theprocessor; a network node reliability monitor for monitoring thereliability of the plurality of nodes on which the data is stored andfor generating storage node reliability information; and an informationdispersal and control unit for initially dispersing data for backupstorage to a plurality of network storage nodes and for dynamicallyredispersing the data over time according to the storage nodereliability information.
 2. A server computer as in claim 1, wherein thedata comprises data associated with a plurality of different users.
 3. Aserver computer as in claim 2, wherein the data comprises a file, aportion of a file, or block of data, and wherein the informationdispersal and control unit uses an information dispersal algorithm thatto segment the file, the portion of a file, or the block of data toredundantly distribute the data to a plurality of storage nodes on thenetwork.
 4. A server computer as in claim 3, wherein the storage nodescomprises hard disc drives on user personal computers coupled to anInternet network.
 5. A server computer as in claim 1, wherein the serverprovides strong security including separation of user data and user keyinformation, the user never having access to its user key which is onlystored on the management server, the server never storing the originalraw user data, and the data storage nodes never having access to all theuser data or user keys.
 6. A server computer as in claim 1, wherein thedata stored on the storage nodes is encrypted.
 7. A server computer asin claim 1, wherein the database defines a data structure for a nodestable, a users table, a user key table, a files table, a password hashtable, and a file backups storage vector table.
 8. A server computer asin claim 7, wherein the database further defines a data structure for atags table, a tag files table, a shares table, and a file objects table.9. A server computer as in claim 1, further including means forcompressing the data to be backed up on the storage nodes.
 10. A servercomputer as in claim 1, further including means for encrypting the datato be backed up on the storage nodes.
 11. A system for operating adistributed data storage system having data security, redundancy, andretrieval features, the system comprising: a server computer including:a processor and a memory coupled to the processor; a networkcommunications interface for coupling the server computer to a network;a database for storing data pertaining to the distributed storage in thedistributed data storage system and coupled to or coupleable with theprocessor; a network node reliability monitor for monitoring thereliability of the plurality of nodes on which the data is stored andfor generating storage node reliability information; and an informationdispersal and control unit for initially dispersing data for backupstorage to a plurality of network storage nodes and for dynamicallyredispersing the data over time according to the storage nodereliability information; and a plurality of user nodes at least a firstone of the nodes including a first user interface adapted for a firstuser to identify a data set for backup storage and at least a second andthird different ones of the nodes adapted for storage of a portion ofthe first user data to be backed up.
 12. A system as in claim 11,wherein the data comprises data associated with a plurality of differentusers.
 13. A system as in claim 11, wherein the data comprises a file, aportion of a file, or block of data, and wherein the informationdispersal and control unit uses an information dispersal algorithm thatto segment the file, the portion of a file, or the block of data toredundantly distribute the data to a plurality of storage nodes on thenetwork.
 14. A system as in claim 13, wherein the storage nodescomprises hard disc drives on user personal computers coupled to anInternet network.
 15. A system as in claim 11, wherein the serverprovides strong security including separation of user data and user keyinformation, the user never having access to its user key which is onlystored on the management server, the server never storing the originalraw user data, and the data storage nodes never having access to all theuser data or user keys.
 16. A system as in claim 11, wherein the datastored on the storage nodes is encrypted.
 17. A system as in claim 11,wherein the database defines a data structure for a nodes table, a userstable, a user key table, a files table, a password hash table, and afile backups storage vector table.
 18. A system as in claim 17, whereinthe database further defines a data structure for a tags table, a tagfiles table, a shares table, and a file objects table.
 19. A system asin claim 11, further including means for compressing the data to bebacked up on the storage nodes.
 20. A system as in claim 11, furtherincluding means for encrypting the data to be backed up on the storagenodes.
 21. A system as in claim 11, wherein the first user interfacecomprises a personal computer coupled to the server by a persistent orintermittent network communication link.
 22. A system as in claim 11,wherein the at least a second and third different ones of the nodesadapted for storage of a portion of the first user data to be backed upcomprise personal computers at different locations each having at leastone storage device for storing the first user data.
 23. A system as inclaim 11, wherein the at least one storage device comprises a hard discdrive storage device of a personal computer.
 24. A method formaintaining reliable distributed storage on a network comprising aplurality of data storage nodes, the method comprising: dispersing thedata to data storage nodes according to the current dispersementstrategy; monitoring and verifying the continued reliability of eachpeer storage node on which a user data is stored; determining if astorage node has become unavailable or unreliable; and redispersing thedata to different storage nodes if it is determined that a storage nodehas become unreliable, and maintaining the current data dispersement ifthe storage nodes on which the data is stored are not determined to beunreliable.
 25. A business method for generating monetary revenues froma distributed data storage system service having data security,redundancy, and retrieval features, the method comprising: providing amanaged consumer backup service to a consumer without a user fee inexchange for the user providing storage for at least one other differentuser data; presenting advertisements to a user when the user interactswith the storage system service; and collecting revenues from theentities placing the advertisements.
 26. A business method as in claim25, further comprising collecting revenues from product and/or servicepartners associated with the storage system service.